Suppose you have a tree represented as a vec of nodes:
struct Tree<T> {
root: Option<usize>
nodes: Vec<Node<T>>
}Nodes can have arbitrarily many children, and we might insert a new node as a child of any existing node, so the natural thing to do is:
struct Node<T> {
data: T,
children: Vec<usize>
}We push a newly inserted node at the end of the nodes
vec, and push its index to its parent’s children vec:
fn insert<T>(&mut self, item: T) {
// ... somehow figure out which node to insert it under
let parent_idx: usize = todo!();
let child_node = Node { data: item, children: vec![] };
let child_idx = self.nodes.len();
self.nodes.push(child_node);
self.nodes[parent_idx].children.push(child_idx);
}But the other day I saw another way of doing it in a video:
// intrusive linked-list Node
struct LLNode<T> {
data: T,
kid: Option<usize>,
sib: Option<usize>
}An LLNode only knows about its first child, and its
first sibling. You iterate through the children in linked-list fashion:
visit the node at the sib index, then that node’s
sib, and so on until you hit the None case of
the Option.
When I saw this I thought, isn’t that a bad memory access
pattern? Because as every software engineer knows, linked lists are
Bad For The CPU Cache™. All of the sibling nodes are scattered
throughout the tree’s nodes vec. And the vec-of-children
way must be Good For The Cache, because all the child indices are
contiguous in memory.
Then I realised, no, the children vec way has
the same exact problem, only worse. The indices in the
children vec may be contiguous in memory, but the child
nodes themselves are not. We insert a new node at end of the tree’s
nodes vec, and most of the time its existing siblings are
not immediately before it. And we have to visit each of those
non-contiguous locations anyway if we actually want to do something with
the child nodes.
In fact, the order of the nodes vec is unchanged if we
switch to LLNode. The insertion logic is the same in both
cases: put it at the end. For traversals, the only important1 difference is that the
LLNode doesn’t incur an extra pointer-chase into a
children vec on the heap. So the intrusive linked-list tree
has an outright superior memory access pattern.
The LLNode way also saves memory. A vec has three 64-bit
members, for the ptr, len, and
cap. That’s 24 bytes of overhead per node, plus whatever
unused capacity sits at the end of its buffer. The LLNode
has none of that. And we can wrap the kid and
sib indices with nonmax so
that the usize::MAX bit pattern is used for the
None variant of the Option:
struct LLNode<T> {
data: T,
kid: Option<NonMaxUsize>,
sib: Option<NonMaxUsize>,
}This is just 8+8 = 16 bytes of structure information per node. And most days we can get away with u32 indices, which halves the cost to 8 bytes.
Anyway none of this is novel. It’s obvious in retrospect. But I just hadn’t encountered the idea before, and my stupid “linked lists are bad for the cache” mental short-circuit was so strong that I simply couldn’t see the point until I traced it all out on paper.
We no longer have O(1) random access among the children, but for trees that seldom comes up.↩︎