Almost every game needs to do frustum culling. There are many more objects in the game world than are visible at any one time, and the renderer should only be concerned with the ones that can be seen. There are many ways to do it. But the performance of such code is quite counter-intuitive. I decided to investigate.

The test case

I needed something extreme, so I chose a forest of 4 million trees, randomly positioned. In the end I wanted about 20000 visible, inside a 90 degree view cone. This is about how many separate objects a game could possibly render. The culling is done in 2D, but this is a roughly equivalent case to a landscape in 3D. I didn’t use any special optimisations for the culling routine; it’s standard floating point code returning whether the region is outside, partially inside or fully inside the frustum.

Brute force

Brute force culling of 4 million objects took 50 ms. A full frame is 16 ms, so this is clearly unacceptable. But it does show how fast the CPU can do these tests. Culling tests on the 20000 objects that were visible would only have taken 0.25 ms, if we hadn’t had to deal with all the rest.

Quad tree

I would say the standard structure for culling is a quad tree. I made one, optimised the depth, and the fastest I could make it go was 0.7 ms. That’s a huge improvement on brute force, but it only did around 8000 tests. Almost the entire time is overhead from cache misses and function calls.

Grid

Next I made a simple grid. Again, I optimised the cell size and made it as fast as I could. The result was 0.8 ms, only slightly slower than the quad tree. This time it did almost 40000 tests, but the overhead was much less, which accounts for the almost negligible difference compared to the tree.

Spatial indexing

Imagine a quad tree without any nodes except at the bottom level. The nodes are stored in the order you would see them if iterating through the tree depth first. It’s processed recursively, but without any pointers to follow it’s much more cache-friendly than a tree. This was the fastest method I found, at 0.5 ms.

Conclusions

  • However you do it, culling shouldn’t take very long. If the total object count is in the tens of thousands, consider brute force.
  • A tree structure gives almost no benefit compared to a grid.
  • Combining hierarchical culling with good cache behaviour is the fastest approach.

 

Advertisements