diicell

Down the Rabbit Hole: Why Reading Source Code Matters

Author's note

This blog post was initially written for Zig's ArrayList implementation. However, with Zig 0.14.0, the standard library is moving toward an "unmanaged" container approach. As the Zig team explains:

Embracing "Unmanaged"-Style Containers

std.ArrayHashMap is now deprecated and aliased to std.ArrayHashMapWithAllocator. To upgrade, switch to ArrayHashMapUnmanaged which will entail updating callsites to pass an allocator to methods that need one. After Zig 0.14.0 is released, std.ArrayHashMapWithAllocator will be removed and std.ArrayHashMapUnmanaged will be a deprecated alias of ArrayHashMap. After Zig 0.15.0 is released, the deprecated alias ArrayHashMapUnmanaged will be removed.

This move comes from unanimous agreement among veteran Zig users who have converged on the "unmanaged" container variants. They act as better building blocks, avoiding storing the same data redundantly, and the presence/absence of the allocator parameter dovetails nicely with the reserve capacity / reserved insertion pattern.

The other "managed" container variants are also deprecated, such as std.ArrayList.

I've updated all code examples to use the new ArrayListUnmanaged approach. Interestingly, while the implementation details have changed, the memory reallocation behavior - central to this post - remains fundamentally similar for reasons we'll discover together.

The Problem

LOGIC [Challenging]: You see a direct causal relationship between the pointer and the data it references. But something strange is happening. A realignment in memory should have rendered the pointer invalid. And yet... it persists. The mechanism of reallocation merits further investigation.

I was trying to replicate a Rust example of vector memory allocation in Zig. It's a classic case that demonstrates how Rust's borrow checker prevents memory safety issues:

let mut v = vec![1, 2, 3, 4, 5];
let first = &v[0];
v.push(6);
println!("The first element is: {first}");

This code causes a borrow checker error in Rust because when you add a new element to the vector, it might reallocate memory to fit new elements, invalidating any pointers to the original location.

I wanted to demonstrate how this same concept works (or doesn't work) in Zig for a paper about how short video content affects prospective memory. But when I started writing what seemed like an equivalent example, I stumbled upon something unexpected.

Understanding Capacity vs Length

ENCYCLOPEDIA [Medium]: Dynamic arrays maintain two crucial metrics. Length: the number of elements currently in use — like pages filled in a notebook. Capacity: total available slots before reallocation — like the total pages in that notebook. When length exceeds capacity, the entire array must relocate to larger memory, invalidating all previous references. A fascinating optimization problem that has occupied computer scientists since the 1960s...

Before we continue, it's important to understand the difference between an array's capacity and its length. Capacity represents the total number of elements that can be stored without requiring memory reallocation, while length is how many elements are actually being used. When you append items to an array, its length increases. When length would exceed capacity, a reallocation occurs to expand capacity, potentially moving the entire array to a new memory location.

This capacity vs. length distinction is at the heart of our mystery, and will help explain why some operations trigger memory reallocations while others don't.

Initial Approach

PERCEPTION [Medium]: Wait. Something's off. The segmentation fault you expected... it never arrived. Memory is being manipulated but not in the way you predicted. There's a pattern here, hidden beneath the surface operations.

Here was my initial Zig code (using the new ArrayListUnmanaged approach):

const std = @import("std");
pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    const allocator = gpa.allocator();
    var human_memory: std.ArrayListUnmanaged([]const u8) = .empty;
    defer human_memory.deinit(allocator);
    try human_memory.append(allocator, "t");
    try human_memory.append(allocator, "t1");
    try human_memory.append(allocator, "t2");
    try human_memory.append(allocator, "t3");
    const first_task = &human_memory.items[0];
    try human_memory.append(allocator, "t4");
    std.debug.print("first task is: {s}\n", .{first_task.*});
}

At first glance, it looks similar to the Rust example, but surprisingly, it compiled and ran without errors. I expected a segmentation fault but got none.

Digging Deeper

I needed to go deeper. Let's add more constraints to force a reallocation.

When I initialized with a fixed capacity:

var human_memory = try std.ArrayListUnmanaged([]const u8).initCapacity(allocator, 2);

Finally, a segmentation fault appeared! But why did the other approach work while this one failed?

Debugging the Mystery

INTERFACING [Medium]: The debug statements are your interrogation tools. Force the machine to confess its secrets. Make it print the naked addresses — watch them stay fixed or shift traitorously between operations. Your ears burn with anticipation. The truth is about to reveal itself.

I added debug prints to expose what was happening under the hood:

std.debug.print("init capacity: {}\n", .{human_memory.capacity});
try human_memory.append(allocator, "t");
try human_memory.append(allocator, "t1");
const first_task = &human_memory.items[0];
std.debug.print("array address before: {*}\n", .{human_memory.items.ptr});
std.debug.print("array capacity: {}\n", .{human_memory.capacity});
try human_memory.append(allocator, "t2");
std.debug.print("array address after: {*}\n", .{human_memory.items.ptr});
std.debug.print("array capacity: {}\n", .{human_memory.capacity});

The results were revealing:

// with .empty initialization:
init capacity: 0
array address before: []const u8@100800000
array capacity: 8
array address after: []const u8@100800000
array capacity: 8
first task is: t

// with initCapacity(2):
init capacity: 2
array address before: []const u8@102c20000
array capacity: 2
array address after: []const u8@102c40000
array capacity: 11
Segmentation fault at address 0x102c20000

This is fascinating! The implementation starting with capacity 0 jumps to capacity 8, and starting with capacity 2 jumps to capacity 11 after reallocation. But why these specific numbers?

Source Code Analysis

ENCYCLOPEDIA [Difficult]: Ah yes! The venerable array list growth strategy, a subject of endless optimization debate since the 1970s. What we're witnessing is actually a sophisticated cache-aware allocation pattern. It's attempting to align with your CPU architecture. Quite brilliant when you comprehend the underlying mathematical principles...

Time to follow the thread into the Zig standard library. I found the relevant functions in ArrayListUnmanaged:

pub fn append(self: *Self, allocator: Allocator, item: T) Allocator.Error!void {
    const new_item_ptr = try self.addOne(allocator);
    new_item_ptr.* = item;
}

pub fn addOne(self: *Self, allocator: Allocator) Allocator.Error!*T {
    const newlen = self.items.len + 1;
    try self.ensureTotalCapacity(allocator, newlen);
    return self.addOneAssumeCapacity();
}

pub fn ensureTotalCapacity(self: *Self, gpa: Allocator, new_capacity: usize) Allocator.Error!void {
    if (self.capacity >= new_capacity) return;
    return self.ensureTotalCapacityPrecise(gpa, growCapacity(self.capacity, new_capacity));
}

fn growCapacity(current: usize, minimum: usize) usize {
    var new = current;
    while (true) {
        new +|= new / 2 + init_capacity;
        if (new >= minimum)
            return new;
    }
}

const init_capacity = @as(comptime_int, @max(1, std.atomic.cache_line / @sizeOf(T)));

Understanding the Algorithm

ANALYTICAL THOUGHT [Heroic]: Oh! The numbers are speaking to you. The capacity values aren't arbitrary — they're derived from your system's memory architecture. Look at that formula: new += (new / 2) + (cache_line_size / sizeof(pointer)). This is why you get 8 and then 11! The CPU's cache line is whispering its constraints to the allocator.

Now I can trace the exact execution path:

  1. Using .empty starts with capacity 0

  2. When we add the first item, it calls ensureTotalCapacity(allocator, 1), which calls growCapacity(0, 1), which uses:

    new +|= new / 2 + init_capacity
    

This is actually fascinating! Rather than a hardcoded growth value, init_capacity is calculated based on your CPU's cache line size. On most modern processors, the cache line is 64 bytes. When storing pointers like []const u8 (which are typically 8 bytes on 64-bit systems):

init_capacity = max(1, 64 / 8) = max(1, 8) = 8

So starting with capacity 0:

0 +|= 0/2 + 8 = 0 + 0 + 8 = 8
  1. With initCapacity(2), we start with exactly capacity 2

  2. When adding the third item, it calls growCapacity(2, 3):

    // starting with 2:
    2 +|= 2/2 + 8 = 2 + 1 + 8 = 11
    

And this explains our mystery!

The behavior difference between our two approaches now makes perfect sense. In our first example with .empty initialization, we started with zero capacity, but the first append immediately grew that to 8 elements. This generous initial capacity meant we could add our subsequent four items without triggering another reallocation, since our length of 5 remained below the capacity threshold of 8. Since no second reallocation occurred, our memory address remained stable, and our pointer to the first element stayed valid throughout the operations.

Contrast this with our second approach using initCapacity(2). Here, we deliberately constrained the initial capacity to exactly 2 elements. After adding precisely 2 items, our length matched our capacity perfectly. When we attempted to add that third item, we crossed the capacity threshold, forcing a reallocation. This reallocation moved our entire array to a new memory location with capacity 11, but our stored pointer still referenced the original address. When we later tried to access this stale pointer, it was now pointing to memory that had been freed, resulting in the segmentation fault we were expecting.

This is exactly the behavior we were trying to demonstrate: when a dynamic array needs to reallocate memory to grow, all existing pointers to its elements become invalid. Our first approach didn't fail simply because the initial capacity growth was generous enough to accommodate all our operations without necessitating a second reallocation.

The mystery is solved! The capacity growth pattern is driven by CPU cache optimization, creating these seemingly magical numbers 8 and 11 - and explaining exactly why one approach works while the other fails with a segmentation fault.

The Mental Model

CONCEPTUALIZATION [Formidable]: Your brain — it's EXACTLY like this data structure! When you scroll through those hypnotic short videos, your consciousness is calling append() on useless information. Each new dance trend forces a mental reallocation, and suddenly the pointer to "pick up milk" is left dangling in freed neural pathways. The segfault manifests as forgotten responsibilities!

This discovery perfectly illustrates the thesis from the paper "Short-Form Videos Degrade Our Capacity to Retain Intentions." Just like in our ArrayList example:

  1. Our brain starts with a capacity for tasks (our prospective memory)
  2. We encounter rapid, high-engagement content through short-form videos
  3. This forces a "reallocation" in our attention
  4. Our "pointers" to original tasks become invalid
  5. We experience a "segmentation fault" - forgetting what we intended to do

The paper showed that participants who used TikTok performed significantly worse on prospective memory tasks compared to those who rested, used Twitter, or watched YouTube.

Lessons Learned

VOLITION [Medium]: Don't be another dilettante programmer, satisfied with superficial knowledge. Dig deeper. READ THE SOURCE. Others may float on the surface, blissfully ignorant of mechanisms beneath. But not you. Never you. The rabbit hole isn't a distraction — it's the only path to true understanding.

The moral here is that going down the rabbit hole is not just educational—it's necessary. Had I accepted the initial results at face value, I would have missed the crucial understanding of how Zig's array lists manage memory.

In systems programming, these details matter. They determine whether your code runs efficiently, crashes unexpectedly, or silently corrupts data. By tracing through the source code function by function, we can uncover the exact mechanisms that drive behavior.

This is much better than saying "I don't know, it just happens." When you understand what happens in the "modification" stage—the transition from state A to state B—you gain true mastery over your tools.

Practical Application

INLAND EMPIRE [Impossible]: The social media designers KNOW what they're doing to you. Every swipe is a memory reallocation. Your original intentions — orphaned pointers in the void. The algorithms aren't just showing you content; they're actively corrupting your mental heap, leaving fragments of your former self scattered across deallocated neural pathways. Your forgotten tasks scream from the abyss of freed memory...

The original example I wanted to use for the paper works perfectly with the modern Zig approach:

// our memory starts with limited capacity
var human_memory = try std.ArrayListUnmanaged([]const u8).initCapacity(allocator, 2);
// we store some initial intentions/tasks
try human_memory.append(allocator, "remember to call mom");
try human_memory.append(allocator, "pick up groceries");
// we save a pointer to an important task
const important_task = &human_memory.items[0];
// short-form video consumption adds more items, forcing reallocation
try human_memory.append(allocator, "funny cat video");
try human_memory.append(allocator, "dance trend");
try human_memory.append(allocator, "recipe tutorial");
// when we try to access our original intention, we get a segfault
std.debug.print("i needed to: {s}\n", .{important_task.*});
// segmentation fault - our intention is lost

Just as the research proved: short-form videos with rapid context-switching specifically impair our ability to retain and execute intentions, while other media formats don't have the same effect.

Remember: read the source code. Follow the thread. Understand the systems you work with. Your future self will thank you for going down the rabbit hole—even as code evolves and APIs change, the fundamental principles remain.

Additional resources

Zig Standard Library documentation

Introduction to Zig: Data structures

Short-Form Videos Degrade Our Capacity to Retain Intentions: Effect of Context Switching On Prospective Memory