dd86k's blog

Machine code enthusiast

Scoped Allocation in D

Author: dd
Published: December 7, 2023
Categories:

Classes don’t always have to use the GC.


Whether you’re writing a Web app or writing device drivers with severe memory constraints, at an eventual moment, you’ll have to worry about how memory is managed.

In a manually managed environment, you may allocate and forget to free the memory region. It leaks over time. The memory monster slowly grows. And suddenly, poof, the computer fans stop spinning. Oh, right, the application crashed.

Garbage Day

Ah, yes, the dreaded Garbage Collector. If it scares you, you’re probably not alone.

The default, conservative, implementation (which can be tweaked, replaced by your own implementation, or removed in a subset mode), manages memory, well, in a conservative way, and will only collect given a few constraints (e.g., low on memory, exhausted memory pool, etc.). While there are other strategies available, it’s what most are happy with.

While traditional, we don’t need the GC to make an instance of a class with a limited lifetime, say, just within a function scope. That’s where scope allocation come in.

Reviewing Solutions

Form A – New Expression

If you’re coming from a C# background, one of the first thing you learn to do with classes is to allocate an instance in heap memory.

In D, this is done the same way.

Here’s an example of a hexdump using a memory-mapped file (note: MmFile is a class).

import std.mmfile, std.algorithm, std.stdio, std.file, std.range;

void main(string[] args)
{
    foreach (arg; args[1..$])
    {
        dump(arg);
    }
}

void dump(string path)
{
    enum columns = 16;
    try
    {
        // Allocate a new instance of MmFile on memory heap using GC
        MmFile mmfile = new MmFile(path); // Can be used outside this scope!
        
        // Let's only take the first 64 Bytes
        ubyte[] data = (cast(ubyte[])mmfile[]).take(64);
        
        // Go by chunks of columns
        foreach (chunk; chunks(data, columns))
        {
            writefln!"%(%02x %)%*s  %s"(
                chunk,
                // Padding
                3 * (columns - chunk.length), "",
                // Replace non-printable
                chunk.map!(c => c < 0x20 || c > 0x7E ? '.' : char(c)));
        }
    }
    catch (Exception ex)
    {
        assert(0, ex.msg);
    }
}

Simple, works, but for every file we’ll read, we’ll allocate an instance of MmFile on the heap. If we were to read 20,000 files, this will allocate 20,000 times, and stay in memory until the GC deems “Ah, this is too much, time to collect” and performs a collection.

Sure, we could manually destroy the instance (via destroy, and even add a scope guard: scope (exit) destroy(mmfile)), but wow does that feel like a hack, doesn’t it? Why pressure the GC?

Form B – Scoped Template

So, you only need to use a class instance for a short period of time, why not use the std.typecons.scoped template? (Here’s even an archive from 2008)

The scoped template allows you to allocate a class instance on the stack. However, you must note that this instances a function performing the allocation while compiling.

import std.mmfile, std.algorithm, std.stdio, std.file, std.range;
import std.typecons : scoped;

void main(string[] args)
{
    foreach (arg; args[1..$])
    {
        dump(arg);
    }
}

void dump(string path)
{
    enum columns = 16;
    try
    {
        // Allocate a new instance of MmFile on stack using the template function
        auto mmfile = scoped!MmFile(path); // Cannot be used outside scope.
        
        // Let's only take the first 64 Bytes
        ubyte[] data = (cast(ubyte[])mmfile[]).take(64);
        
        // Go by chunks of columns
        foreach (chunk; chunks(data, columns))
        {
            writefln!"%(%02x %)%*s  %s"(
                chunk,
                // Padding
                3 * (columns - chunk.length), "",
                // Replace non-printable
                chunk.map!(c => c < 0x20 || c > 0x7E ? '.' : char(c)));
        }
    }
    catch (Exception ex)
    {
        assert(0, ex.msg);
    }
}

The scoped template function does the necessary work, but there’s a lot of template boilerplate being carried over and takes a lot more instructions to initiate and manage.

Alas, hey, at least we’re not putting any additional pressure on the GC.

Form C – Scope Allocation

And the final form, combining the NewExpression syntax and stack allocation… Scope Allocation!

import std.mmfile, std.algorithm, std.stdio, std.file, std.range;

void main(string[] args)
{
    foreach (arg; args[1..$])
    {
        dump(arg);
    }
}

void dump(string path)
{
    enum columns = 16;
    try
    {
        // Allocate a new instance of MmFile on stack... Inlined!
        scope mmfile = new MmFile(path); // Also cannot be used outside of scope
        
        // Let's only take the first 64 Bytes
        ubyte[] data = (cast(ubyte[])mmfile[]).take(64);
        
        // Go by chunks of columns
        foreach (chunk; chunks(data, columns))
        {
            writefln!"%(%02x %)%*s  %s"(
                chunk,
                // Padding
                3 * (columns - chunk.length), "",
                // Replace non-printable
                chunk.map!(c => c < 0x20 || c > 0x7E ? '.' : char(c)));
        }
    }
    catch (Exception ex)
    {
        assert(0, ex.msg);
    }
}

Well, does it perform any better?

Benchmarks

I’ve written a quick test with some statistics with each variant in dedicated versions. Basically, each version will open the same file and perform an SHA-1 digest for 30,000 times (fair if this is a server!). The file in question is 351 Bytes.

Do note that the MmFile class (~128 Bytes per instance) falls under the “SmallAlloc” strategy for the GC by reusing the same allocation pool, so results may vary on larger allocations.

The benchmark could have been done better with other files, but this is a focus on allocation time with a small class size. Do note that “pause time” is the sweep phase, and the “collection time” is the sweep phase.

All builds were realized using the release-nobounds build type.

StatisticNewExpressionscoped TemplateScope Allocation
Execution time
(Avg. arithmetic mean of execution time for the function, so the whole digest operation)
dmd-win64: 45 µs
ldc-win64: 40 µs
dmd-win64: 40 µs
ldc-win64: 39 µs
dmd-win64: 39 µs
ldc-win64: 39 µs
Stack delta
(Difference in stack memory, GC statistic)
dmd-win64: 3750 KiB
ldc-win64: 3750 KiB
dmd-win64: 0 B
ldc-win64: 0 B
dmd-win64: 0 B
ldc-win64: 0 B
Used memory delta
(GC statistic)
dmd-win64: 709 KiB
ldc-win64: 738 KiB
dmd-win64: 0 B
ldc-win64: 0 B
dmd-win64: 0 B
ldc-win64: 0 B
Number of collectionsdmd-win64: 3
ldc-win64: 3
dmd-win64: 0
ldc-win64: 0
dmd-win64: 0
ldc-win64: 0
Maximum pause timedmd-win64: 849 µs
ldc-win64: 927 µs
dmd-win64: N/A
ldc-win64: N/A
dmd-win64: N/A
ldc-win64: N/A
Maximum collection timedmd-win64: 148 ms
ldc-win64: 115 ms
dmd-win64: N/A
ldc-win64: N/A
dmd-win64: N/A
ldc-win64: N/A
Total pause timedmd-win64: 959 µs
ldc-win64: 970 µs
dmd-win64: N/A
ldc-win64: N/A
dmd-win64: N/A
ldc-win64: N/A
Total collection timedmd-win64: 342 ms
ldc-win64: 296 ms
dmd-win64: N/A
ldc-win64: N/A
dmd-win64: N/A
ldc-win64: N/A
Executable sizedmd-win64: 907 776 B
ldc-win64: 736 256 B
dmd-win64: 908 800 B
ldc-win64: 736 768 B
dmd-win64: 907 776 B
ldc-win64: 736 768 B
Test results using dmd 2.103.1 and ldc 1.32.2. If I recall correctly.

Well, it does seem like scope allocation takes as much space in the executable as a normal allocation while giving the performance of the scoped function template. Nice.

If I retry with a 2.23 MiB file: About the same time (3ms 230µs), but with the difference of additional memory allocations and total collection times of 1 second. I currently do not know if the sweep phase is performed in parallel after a pause, but the GC scanning phase is multithreaded by default.

Conclusion

If you only need a short-term class allocation for a simple task, and you’re not using an ancient D compiler, why not use scope allocation? For long-lived instances (e.g., in global memory), there is nothing wrong using the GC.

In similar fashion, for example, C++ can do this without the new operator by instancing the class on-stack, and I’m pretty certain many other OOP languages have some sort of feature.

However, be mindful if you pass a scope allocated instance to a function, the class instance cannot be saved for later use (e.g., referenced to another lvalue).

Stay fancy.