# Iterating Over Binary Trees

Binary trees are great.
If you ever have to implement one yourself though, you're probably either using C or you need to look at the documentation for your language's standard library more closely.
Even POSIX C has the tsearch family of functions from `<search.h>`

.

Recently I found myself attempting to implement a bootstrapping compiler in a rather restrictive subset of Java, and found the need for some kind of associative array data structure. I chose a splay tree, because, well, I like them.

Soon I found that I wanted the ability to iterate over the entire tree, in sorted order.
This ability is provided by C++'s `std::map::iterator`

s, for example.
I considered using a threaded tree, then an iterator holding a stack, before I realised that there is a much simpler solution with equal performance.
It requires only that the nodes store a reference to their parent node, which mine could with little extra effort.

The principle is simple. If a node has a right subtree, its successor is the minimal (leftmost) node in that subtree. Otherwise, its successor is the parent of its first ancestor which is not a right child. In code,

```
public class Node {
private Node left, right;
public Node next() {
Node next;
if (right == null) {
// Node has no right child
next = this;
while (next.parent != null && next == next.parent.right) {
next = next.parent
}
next = next.parent;
} else {
// Find the leftmost node in the right subtree
next = right;
while (next.left != null) {
next = next.left;
}
}
return next;
}
}
```

But how fast is this?
An individual call to `node.next()`

could traverse from the bottom to the top of the tree, at a cost of $O(h)$, where $h$ is the height of the tree.
If there are $n$ nodes in the tree, could we traverse $O(n h)$ edges to visit the whole tree?
This would be particularly bad for a splay tree, which is not necessarily balanced at any given time.

It turns out that we need only traverse $O(n)$ edges to visit the entire tree. For an informal proof, note that our algorithm is basically equivalent to tracing an outline of the tree like this:

Clearly we can only trace around any edge at most twice (on its left and right side). Since any tree has $n - 1$ edges, we traverse at most $2 (n - 1) \in O(n)$ edges in the process of exploring the whole tree.