# Iterating Over Binary Trees

2012-04-19 Tavian Barnes

Binary trees are great. If you ever have to implement one yourself though, you're probably either using C or you need to look at the documentation for your language's standard library more closely. Even POSIX C has the tsearch family of functions from <search.h>.

Recently I found myself attempting to implement a bootstrapping compiler in a rather restrictive subset of Java, and found the need for some kind of associative array data structure. I chose a splay tree, because, well, I like them.

Soon I found that I wanted the ability to iterate over the entire tree, in sorted order. This ability is provided by C++'s std::map::iterators, for example. I considered using a threaded tree, then an iterator holding a stack, before I realised that there is a much simpler solution with equal performance. It requires only that the nodes store a reference to their parent node, which mine could with little extra effort.

The principle is simple. If a node has a right subtree, its successor is the minimal (leftmost) node in that subtree. Otherwise, its successor is the parent of its first ancestor which is not a right child. In code,

public class Node {
private Node left, right;

public Node next() {
Node next;

if (right == null) {
// Node has no right child
next = this;
while (next.parent != null && next == next.parent.right) {
next = next.parent
}
next = next.parent;
} else {
// Find the leftmost node in the right subtree
next = right;
while (next.left != null) {
next = next.left;
}
}

return next;
}
}


But how fast is this? An individual call to node.next() could traverse from the bottom to the top of the tree, at a cost of $O(h)$, where $h$ is the height of the tree. If there are $n$ nodes in the tree, could we traverse $O(n h)$ edges to visit the whole tree? This would be particularly bad for a splay tree, which is not necessarily balanced at any given time.

It turns out that we need only traverse $O(n)$ edges to visit the entire tree. For an informal proof, note that our algorithm is basically equivalent to tracing an outline of the tree like this: Clearly we can only trace around any edge at most twice (on its left and right side). Since any tree has $n - 1$ edges, we traverse at most $2 (n - 1) \in O(n)$ edges in the process of exploring the whole tree.