Scoped Allocation in D
Classes don’t always have to use the GC.
Whether you’re writing a Web app or writing device drivers with severe memory constraints, at an eventual moment, you’ll have to worry about how memory is managed.
In a manually managed environment, you may allocate and forget to free the memory region. It leaks over time. The memory monster slowly grows. And suddenly, poof, the computer fans stop spinning. Oh, right, the application crashed.
Garbage Day
Ah, yes, the dreaded Garbage Collector. If it scares you, you’re probably not alone.
The default, conservative, implementation (which can be tweaked, replaced by your own implementation, or removed in a subset mode), manages memory, well, in a conservative way, and will only collect given a few constraints (e.g., low on memory, exhausted memory pool, etc.). While there are other strategies available, it’s what most are happy with.
While traditional, we don’t need the GC to make an instance of a class with a limited lifetime, say, just within a function scope. That’s where scope allocation come in.
Reviewing Solutions
Form A – New Expression
If you’re coming from a C# background, one of the first thing you learn to do with classes is to allocate an instance in heap memory.
In D, this is done the same way.
Here’s an example of a hexdump using a memory-mapped file (note: MmFile is a class).
import std.mmfile, std.algorithm, std.stdio, std.file, std.range; void main(string[] args) { foreach (arg; args[1..$]) { dump(arg); } } void dump(string path) { enum columns = 16; try { // Allocate a new instance of MmFile on memory heap using GC MmFile mmfile = new MmFile(path); // Can be used outside this scope! // Let's only take the first 64 Bytes ubyte[] data = (cast(ubyte[])mmfile[]).take(64); // Go by chunks of columns foreach (chunk; chunks(data, columns)) { writefln!"%(%02x %)%*s %s"( chunk, // Padding 3 * (columns - chunk.length), "", // Replace non-printable chunk.map!(c => c < 0x20 || c > 0x7E ? '.' : char(c))); } } catch (Exception ex) { assert(0, ex.msg); } }
Simple, works, but for every file we’ll read, we’ll allocate an instance of MmFile on the heap. If we were to read 20,000 files, this will allocate 20,000 times, and stay in memory until the GC deems “Ah, this is too much, time to collect” and performs a collection.
Sure, we could manually destroy the instance (via destroy, and even add a scope guard: scope (exit) destroy(mmfile)
), but wow does that feel like a hack, doesn’t it? Why pressure the GC?
Form B – Scoped Template
So, you only need to use a class instance for a short period of time, why not use the std.typecons.scoped template? (Here’s even an archive from 2008)
The scoped template allows you to allocate a class instance on the stack. However, you must note that this instances a function performing the allocation while compiling.
import std.mmfile, std.algorithm, std.stdio, std.file, std.range; import std.typecons : scoped; void main(string[] args) { foreach (arg; args[1..$]) { dump(arg); } } void dump(string path) { enum columns = 16; try { // Allocate a new instance of MmFile on stack using the template function auto mmfile = scoped!MmFile(path); // Cannot be used outside scope. // Let's only take the first 64 Bytes ubyte[] data = (cast(ubyte[])mmfile[]).take(64); // Go by chunks of columns foreach (chunk; chunks(data, columns)) { writefln!"%(%02x %)%*s %s"( chunk, // Padding 3 * (columns - chunk.length), "", // Replace non-printable chunk.map!(c => c < 0x20 || c > 0x7E ? '.' : char(c))); } } catch (Exception ex) { assert(0, ex.msg); } }
The scoped
template function does the necessary work, but there’s a lot of template boilerplate being carried over and takes a lot more instructions to initiate and manage.
Alas, hey, at least we’re not putting any additional pressure on the GC.
Form C – Scope Allocation
And the final form, combining the NewExpression syntax and stack allocation… Scope Allocation!
import std.mmfile, std.algorithm, std.stdio, std.file, std.range; void main(string[] args) { foreach (arg; args[1..$]) { dump(arg); } } void dump(string path) { enum columns = 16; try { // Allocate a new instance of MmFile on stack... Inlined! scope mmfile = new MmFile(path); // Also cannot be used outside of scope // Let's only take the first 64 Bytes ubyte[] data = (cast(ubyte[])mmfile[]).take(64); // Go by chunks of columns foreach (chunk; chunks(data, columns)) { writefln!"%(%02x %)%*s %s"( chunk, // Padding 3 * (columns - chunk.length), "", // Replace non-printable chunk.map!(c => c < 0x20 || c > 0x7E ? '.' : char(c))); } } catch (Exception ex) { assert(0, ex.msg); } }
Well, does it perform any better?
Benchmarks
I’ve written a quick test with some statistics with each variant in dedicated versions. Basically, each version will open the same file and perform an SHA-1 digest for 30,000 times (fair if this is a server!). The file in question is 351 Bytes.
Do note that the MmFile class (~128 Bytes per instance) falls under the “SmallAlloc” strategy for the GC by reusing the same allocation pool, so results may vary on larger allocations.
The benchmark could have been done better with other files, but this is a focus on allocation time with a small class size. Do note that “pause time” is the sweep phase, and the “collection time” is the sweep phase.
All builds were realized using the release-nobounds
build type.
Statistic | NewExpression | scoped Template | Scope Allocation |
---|---|---|---|
Execution time (Avg. arithmetic mean of execution time for the function, so the whole digest operation) | dmd-win64: 45 µs ldc-win64: 40 µs | dmd-win64: 40 µs ldc-win64: 39 µs | dmd-win64: 39 µs ldc-win64: 39 µs |
Stack delta (Difference in stack memory, GC statistic) | dmd-win64: 3750 KiB ldc-win64: 3750 KiB | dmd-win64: 0 B ldc-win64: 0 B | dmd-win64: 0 B ldc-win64: 0 B |
Used memory delta (GC statistic) | dmd-win64: 709 KiB ldc-win64: 738 KiB | dmd-win64: 0 B ldc-win64: 0 B | dmd-win64: 0 B ldc-win64: 0 B |
Number of collections | dmd-win64: 3 ldc-win64: 3 | dmd-win64: 0 ldc-win64: 0 | dmd-win64: 0 ldc-win64: 0 |
Maximum pause time | dmd-win64: 849 µs ldc-win64: 927 µs | dmd-win64: N/A ldc-win64: N/A | dmd-win64: N/A ldc-win64: N/A |
Maximum collection time | dmd-win64: 148 ms ldc-win64: 115 ms | dmd-win64: N/A ldc-win64: N/A | dmd-win64: N/A ldc-win64: N/A |
Total pause time | dmd-win64: 959 µs ldc-win64: 970 µs | dmd-win64: N/A ldc-win64: N/A | dmd-win64: N/A ldc-win64: N/A |
Total collection time | dmd-win64: 342 ms ldc-win64: 296 ms | dmd-win64: N/A ldc-win64: N/A | dmd-win64: N/A ldc-win64: N/A |
Executable size | dmd-win64: 907 776 B ldc-win64: 736 256 B | dmd-win64: 908 800 B ldc-win64: 736 768 B | dmd-win64: 907 776 B ldc-win64: 736 768 B |
Well, it does seem like scope allocation takes as much space in the executable as a normal allocation while giving the performance of the scoped function template. Nice.
If I retry with a 2.23 MiB file: About the same time (3ms 230µs), but with the difference of additional memory allocations and total collection times of 1 second. I currently do not know if the sweep phase is performed in parallel after a pause, but the GC scanning phase is multithreaded by default.
Conclusion
If you only need a short-term class allocation for a simple task, and you’re not using an ancient D compiler, why not use scope allocation? For long-lived instances (e.g., in global memory), there is nothing wrong using the GC.
In similar fashion, for example, C++ can do this without the new operator by instancing the class on-stack, and I’m pretty certain many other OOP languages have some sort of feature.
However, be mindful if you pass a scope allocated instance to a function, the class instance cannot be saved for later use (e.g., referenced to another lvalue).
Stay fancy.