A quick trick for faster naïve matrix multiplication

If you need to multiply some matrices together very quickly, usually it's best to use a highly optimized library like ATLAS. But sometimes adding such a dependency isn't worth it, if you're worried about portability, code size, etc. If you just need good performance, rather than the best possible performance, it can make sense to hand-roll your own matrix multiplication function. Continue reading A quick trick for faster naïve matrix multiplication

bfs from the ground up, part 1: traversal

bfs is a tool I've been writing for about a year, with the goal of being a drop-in replacement for the UNIX find command. This series of posts will be deep technical explorations of its implementation, starting from the lower levels all the way up to the user interface. bfs is small (only about 3,500 lines of C code), which makes it possible to do a fairly complete analysis. But the codebase is fairly clean and highly optimized, which should make the analysis interesting. Continue reading bfs from the ground up, part 1: traversal

Fast, Branchless Ray/Bounding Box Intersections, Part 2: NaNs

In part 1, I outlined an algorithm for computing intersections between rays and axis-aligned bounding boxes. The idea to eliminate branches by relying on IEEE 754 floating point properties goes back to Brian Smits in [1], and the implementation was fleshed out by Amy Williams. et al. in [2].

Continue reading Fast, Branchless Ray/Bounding Box Intersections, Part 2: NaNs