✓

Follow along with this comprehensive guide

Heap allocations are a major bottleneck in Go programs, causing slowdowns and extra work for the garbage collector. Stack allocations, in contrast, are nearly free and automatically cleaned up. This Q&A explores how Go programmers can leverage stack allocation—especially for slices with known sizes—to write faster, more efficient code. We'll break down the mechanics, common pitfalls, and practical optimizations based on insights from the Go team's recent work.

What causes heap allocations to be slow in Go?

Each time a Go program allocates memory from the heap, the runtime must execute a relatively large block of code to satisfy the request. This involves searching for a suitable free block, updating metadata, and potentially triggering garbage collection. Additionally, heap allocations place ongoing load on the garbage collector, which must later track and reclaim that memory. Even with enhancements like the Green Tea collector, GC overhead remains significant. These costs add up, especially in hot code paths where allocations happen frequently. The result is reduced throughput and higher latency, making heap allocation a prime target for optimization.

Optimizing Go Performance: Harnessing Stack Allocation — Source: blog.golang.org

How does stack allocation improve performance?

Stack allocations are inherently much cheaper than heap allocations. They often require just a single instruction to adjust the stack pointer—essentially free. Moreover, stack allocations impose zero burden on the garbage collector, because they are automatically discarded when the function returns. This also enables prompt reuse of memory, which is highly cache-friendly since the same stack frame memory is reused repeatedly. In contrast, heap-allocated objects linger until collected, potentially causing fragmentation and cache misses. By moving allocations from the heap to the stack, Go programs can achieve substantial speedups with minimal code changes.

Why does repeatedly appending to a slice cause many heap allocations?

When you append to a slice without preallocating a backing array, Go uses a dynamic growth strategy. On the first iteration, an empty slice gets a new backing store of size 1. When that fills, a new size-2 backing store is allocated (copying the old data). Next, size 4, then 8, and so on—doubling each time. This means the first few appends trigger repeated heap allocations and copying of elements. For example, after five iterations, you may have allocated backing stores of sizes 1, 2, 4, and 8, with only the last one surviving. All earlier allocations become garbage, wasting time and memory. This "startup phase" is especially painful for small slices that never grow large.

How does the slice growth pattern lead to wasteful overhead?

The doubling strategy is a trade-off: it ensures amortized constant-time appends, but the initial steps are inefficient. For a slice that eventually holds many items, the early allocations are a tiny fraction of the total cost. However, if your slice never gets large—say, it always contains fewer than 10 items—the startup phase dominates. You might allocate 1, 2, 4, and then 8-element backing stores just to hold 5 items. Not only does this incur multiple calls to the allocator, but it also produces several short-lived objects that the GC must later collect. In hot code, this overhead can seriously degrade performance. The key insight is that if you know the maximum size in advance, you can avoid this dance entirely.

When can we allocate a slice on the stack instead of the heap?

A slice can be allocated on the stack when its backing array’s size is known at compile time and fits within the stack frame. For example, if you know a slice will hold at most 100 tasks, you can use a fixed-size array like var buf [100]task and then create a slice header pointing to it (tasks := buf[:0]). This allocates the array on the stack, and appending up to 100 items will never touch the heap. More generally, the Go compiler employs escape analysis to decide whether an allocation can safely live on the stack. If a value does not escape the function—for instance, if it is not returned or stored in a global—the compiler may place it on the stack automatically. However, slices with dynamic or unknown sizes still escape to the heap. The moral: prefer known-size arrays or cap-limited slices when performance matters.

How can we optimize the example of building a slice from a channel?

In the original code—a loop pulling tasks from a channel and appending them to a slice—we can preallocate the backing store to avoid repeated heap allocations. If the number of tasks is known or estimated, use tasks := make([]task, 0, expectedCount). This single heap allocation of the full capacity eliminates all the intermediate reallocations. If the maximum is fixed, even better: use a stack-allocated array. For instance:

func process(c chan task) {
    var buf [100]task
    tasks := buf[:0]
    for t := range c {
        tasks = append(tasks, t)
    }
    processAll(tasks)
}

This ensures the backing array lives on the stack, and appending up to 100 tasks causes zero heap allocations. Even for larger sizes, preallocating with make dramatically reduces GC pressure and allocation overhead. Always consider the slice’s expected lifetime and capacity to choose the right allocation strategy.

Optimizing Go Performance: Harnessing Stack Allocation