Fast, Branchless Ray/Bounding Box Intersections, Part 3: Boundaries

2022-07-27 Tavian Barnes

Last time I wrote about ray/box intersections, we left off with a simple and fast implementation of the slab method that handles all the corner cases. To save you the trouble of (re-)reading the last two posts, I'll start with a quick recap. Feel free to skip to the next section if you're already familiar with it.

An axis-aligned bounding box $\square$ can be specified by the coordinates of two of its corners:

$\square_{\min} = \begin{pmatrix} x_{\min} \\ y_{\min} \\ z_{\min} \end{pmatrix}$ $\square_{\max} = \begin{pmatrix} x_{\max} \\ y_{\max} \\ z_{\max} \end{pmatrix}$

A point $(x, y, z)$ is inside the box if and only if

\begin{alignedat}{3}
& x_{\min} && < x && < x_{\max} \\
& y_{\min} && < y && < y_{\max} \\
& z_{\min} && < z && < z_{\max}. \\
\end{alignedat}

Equivalently, the interior of the box is the intersection of many half-spaces:

$x > x_{\min}$ $x < x_{\max}$ $y > y_{\min}$ $y < y_{\max}$

The slab method tests for an intersection between a ray and an AABB (axis-aligned bounding box) by repeatedly clipping the line with these half-spaces. In the end, what's left of the ray is the segment that intersects the box, if any.

A ray is typically defined parametrically by

\ell(t) = \begin{pmatrix}
x_0 \\
y_0 \\
z_0 \\
\end{pmatrix} + t \begin{pmatrix}
x_d \\
y_d \\
z_d \\
\end{pmatrix},

where $(x_0, y_0, z_0)$ is the origin of the ray and $(x_d, y_d, z_d)$ is the ray's direction. $t > 0$ is considered “in front” of the origin, while $t < 0$ is “behind” it. We can find where the ray intersects an axis-aligned plane by solving (for example)

\ell_x(t) = x_0 + t x_d = x

for

t = \frac{x - x_0}{x_d}.

The segment of the ray that intersects the $x$ planes of the bounding box is then

$\displaystyle t_1 = \frac{x_{\min} - x_0}{x_d}$ $\displaystyle t_2 = \frac{x_{\max} - x_0}{x_d}$

\begin{aligned}
t_{\min} & = \min \{ t_1, t_2 \} \\
t_{\max} & = \max \{ t_1, t_2 \}. \\
\end{aligned}

The intersection of two segments $(u_{\min}, u_{\max})$ and $(v_{\min}, v_{\max})$ is just

\begin{aligned}
t_{\min} & = \max \{ u_{\min}, v_{\min} \} \\
t_{\max} & = \min \{ u_{\max}, v_{\max} \}. \\
\end{aligned}

If we end up with $t_{\min} > t_{\max}$, the intersection is empty.

Putting all this together leads to a naturally branch-free ray/AABB intersection test (assuming min() and max() are branch-free and loops are unrolled):

bool intersection(ray r, box b) {
    float tmin = 0.0, tmax = INFINITY;

    for (int i = 0; i < 3; ++i) {
        float t1 = (b.min[i] - r.origin[i]) * r.dir_inv[i];
        float t2 = (b.max[i] - r.origin[i]) * r.dir_inv[i];

        tmin = max(tmin, min(t1, t2));
        tmax = min(tmax, max(t1, t2));
    }

    return tmin < tmax;
}

We precompute r.dir_inv[i] = 1.0 / r.dir[i] to replace division (slow) with multiplication (fast). You might worry about dividing by zero, but part 1 describes how floating point infinities lead to the right answer even when r.dir[i] == 0.0.

You might also worry about the multiplication, since in floating point, $0 * \infty \equiv \mathrm{NaN}$. Luckily, part 2 describes a simple fix:

-        tmin = max(tmin, min(t1, t2));
-        tmax = min(tmax, max(t1, t2));
+        tmin = max(tmin, min(min(t1, t2), tmax));
+        tmax = min(tmax, max(max(t1, t2), tmin));

which works for the most efficient implementation of min()/max():

static inline float min(float x, float y) {
    return x < y ? x : y;
}

static inline float max(float x, float y) {
    return x > y ? x : y;
}

Boundaries

You may have noticed that all the inequalities above are strict ($<$, not $\le$). That means rays which just touch a corner, edge, or face of the bounding box will be considered non-intersecting.

This might not be too big a deal, since rays that exactly touch the boundary of the box are a somewhat degenerate case. Of course, degenerate doesn't mean impossible or even unlikely, so it would be nice to include the boundary. Several comments on the last post agree.

Sadly, relaxing the inequalities fixes the corner case, but not the edge case, since that's the one involving NaN:

-    return tmin < tmax;
+    return tmin <= tmax;

I alluded to this in the previous post:

(Since this is an edge case, you might wonder why we don't choose to return true instead of false for rays on the boundary. It turns out to be much harder to get this behaviour with efficient code.)

I'm happy to announce that after seven years of grueling research thinking about it for a few minutes yesterday, I finally found an efficient implementation that includes the boundary:

-        tmin = max(tmin, min(t1, t2));
-        tmax = min(tmax, max(t1, t2));
+        tmin = min(max(t1, tmin), max(t2, tmin));
+        tmax = max(min(t1, tmax), min(t2, tmax));

This works for both the SSE-friendly and the IEEE-compliant min()/max() implementations, but not the NaN-propagating ones.

Performance

The boundary-inclusive implementation looks like this:

bool intersection(ray r, box b) {
    float tmin = 0.0, tmax = INFINITY;

    for (int i = 0; i < 3; ++i) {
        float t1 = (b.min[i] - r.origin[i]) * r.dir_inv[i];
        float t2 = (b.max[i] - r.origin[i]) * r.dir_inv[i];

        tmin = min(max(t1, tmin), max(t2, tmin));
        tmax = max(min(t1, tmax), min(t2, tmax));
    }

    return tmin <= tmax;
}

I benchmarked it against the versions from part 1 and part 2 (adjusted to also include the boundary, and to use float instead of double). Part 1 was still the fastest at 271 million rays per second, but this version was only 9% slower at 241 Mrays/s, beating part 2's 235 Mrays/s. At least, when compiled with clang -O3. GCC 12.1 stubbornly refused to use single instructions for some min()/max() calls, which destroyed its performance.