Hues

k-d Forests

Recently I saw an interesting Code Golf problem: create an image using all possible RGB colours. The top ranked submission, by József Fejes, contained some truly beautiful images, but they took an immense amount of time to generate.

As described in the answer and a follow-up blog post, the images were generated by putting a random colour in the centre, then placing the next colour adjacent to the most similar colour, repeatedly. The submitted code used exhaustive search over the boundary pixels to find the best location; using some geometric data structures, we can do a lot better.

The problem of finding where to place the current pixel amounts to finding the nearest neighbour to the colour being placed, using Euclidean distance over the RGB colour space. The most common data structure for nearest neighbour search is the k-d tree, but they don't perform well in the face of rapid insertion and removal. Since the boundary changes every time we place a pixel, we'll need a more dynamic data structure.

Usually R-trees are used for problems like these, but I came up with another idea. Instead of a single k-d tree, we can use a collection of k-d trees (hence the name "k-d forest") with increasing power-of-2 sizes. It looks like this:

k-d Forest Animation

Rather than re-build a large tree on every insertion, we usually only have to rebuild the first few small trees in the forest. Since k-d trees can be built in \(O \left( n\,\lg n \right)\) time, we get this performance for \(n\) insertions:

$$
\begin{align*}
\sum_{i=0}^{\lg n} \frac{n}{2^{i+1}} \, O \left( 2^i \, \lg 2^i \right)
& = \sum_{i=0}^{\lg n} O \left( \frac{n}{2^{i+1}} \, i \, 2^i \right) \\
{} & = \sum_{i=0}^{\lg n} O \left( n \, i \right) \\
{} & = O \left( n \, \sum_{i=0}^{\lg n} i \right) \\
{} & = O \left( n \, \lg^2 n \right).
\end{align*}
$$

Allowing node removal destroys this nice amortized performance though, because you could repeatedly remove and add a node right when the largest tree has to get rebuilt. So we cheat: instead of really deleting nodes, simply set a "removed" flag to true. To avoid wasting time searching through removed nodes, we can rebuild the whole forest without them once they reach, say, 50% of the total nodes.

Query performance is also nice. Since a single k-d tree takes \(O \left( \lg n \right)\) on average to answer a nearest-neighbour query, and we have up to \(O \left( \lg n \right)\) trees, our average running time per query is

$$
\begin{align*}
\sum_{i=0}^{\lg n} O \left( \lg 2^i \right)
& = \sum_{i=0}^{\lg n} O \left( i \right) \\
{} & = O \left( \lg^2 n \right).
\end{align*}
$$

So what's the real-world performance like? Well, it takes only about four minutes to generate a 4096×4096 image using all possible 24-bit colours. A straight-across comparison isn't exactly fair due to me using C instead of C♯, but the original code took dozens of hours to generate the same images.

Rather than making the images available, I've hosted the code here. It takes about as long to generate the images (even for the full 24 bit spectrum) as it would to download them anyway. Simply checkout the code and run make image to generate an image. You can customize some #defines in main.c to generate different kinds and sizes of images.

EDIT: Since József Fejes's 24-bit images aren't up anymore, I've shared a Google Drive folder with mine here.

Thanks to József Fejes for the concept.

6 thoughts on “k-d Forests”

  1. Hey, I encountered some errors with building this, and I wanted to share how I resolved them.

    Basically, running `make` yielded linker-errors saying it couldn't find references to a variety of functions in libpng. To fix this, I had to modify the make command to be:

    cc -std=c99 -pipe -O2 -Werror -Wall -Wpedantic -Wextra -Wno-sign-compare -Wno-unused-parameter -Wunreachable-code -Wshadow -Wpointer-arith -Wwrite-strings -Wcast-align -Wstrict-prototypes -lm -lpng -o kd-forest kd-forest.c util.c color.c main.c -lpng

    The change is the addition of -lpng on the end. That resolved my compilation problem.

  2. You can see in József's hue-sorted images that there are some artefacts near the edges. This happens because there are naturally "holes" in the image as it's generated, and by the time the purple pixels are being placed, those holes are the only empty places for them.

    In the *hue.png images, I've tried to mitigate it by doing multiple passes through the sorted colours. You can see that in the code here. That's why you see multiple "rings" of the entire spectrum.

    In the *hue2.png images, I've used a better method. Instead of multiple evenly-spaced stripes, I do every other colour, then every 4th, then every 8th, 16th, etc. You can see that in the code here.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>