Linked lists aren’t always worse on the cache

struct Tree<T> {
    root: Option<usize>
    nodes: Vec<Node<T>>
}

Nodes can have arbitrarily many children, and we might insert a new node as a child of any existing node, so the natural thing to do is:

struct Node<T> {
    data: T,
    children: Vec<usize>
}

We push a newly inserted node at the end of the nodes vec, and push its index to its parent’s children vec:

fn insert<T>(&mut self, item: T) {
    // ... somehow figure out which node to insert it under
    let parent_idx: usize = todo!();
    let child_node = Node { data: item, children: vec![] };
    let child_idx = self.nodes.len();
    self.nodes.push(child_node);
    self.nodes[parent_idx].children.push(child_idx);
}

// intrusive linked-list Node
struct LLNode<T> {
    data: T,
    kid: Option<usize>,
    sib: Option<usize>
}

An LLNode only knows about its first child, and its first sibling. You iterate through the children in linked-list fashion: visit the node at the sib index, then that node’s sib, and so on until you hit the None case of the Option.

When I saw this I thought, isn’t that a bad memory access pattern? Because as every software engineer knows, linked lists are Bad For The CPU Cache™. All of the sibling nodes are scattered throughout the tree’s nodes vec. And the vec-of-children way must be Good For The Cache, because all the child indices are contiguous in memory.

Then I realised, no, the children vec way has the same exact problem, only worse. The indices in the children vec may be contiguous in memory, but the child nodes themselves are not. We insert a new node at end of the tree’s nodes vec, and most of the time its existing siblings are not immediately before it. And we have to visit each of those non-contiguous locations anyway if we actually want to do something with the child nodes.

In fact, the order of the nodes vec is unchanged if we switch to LLNode. The insertion logic is the same in both cases: put it at the end. For traversals, the only important¹ difference is that the LLNode doesn’t incur an extra pointer-chase into a children vec on the heap. So the intrusive linked-list tree has an outright superior memory access pattern.

The LLNode way also saves memory. A vec has three 64-bit members, for the ptr, len, and cap. That’s 24 bytes of overhead per node, plus whatever unused capacity sits at the end of its buffer. The LLNode has none of that. And we can wrap the kid and sib indices with nonmax so that the usize::MAX bit pattern is used for the None variant of the Option:

struct LLNode<T> {
    data: T,
    kid: Option<NonMaxUsize>,
    sib: Option<NonMaxUsize>,
}

This is just 8+8 = 16 bytes of structure information per node. And most days we can get away with u32 indices, which halves the cost to 8 bytes.

Anyway none of this is novel. It’s obvious in retrospect. But I just hadn’t encountered the idea before, and my stupid “linked lists are bad for the cache” mental short-circuit was so strong that I simply couldn’t see the point until I traced it all out on paper.

We no longer have O(1) random access among the children, but for trees that seldom comes up.↩︎