Writing your own physics engine is no good unless the code runs fast enough. This is how I got 1000 boxes in a stack running at 60 fps.

#### Profiling

Very Sleepy is a free sampling profiler, easy to install and simple to use. I used it to find the hotspots, and I would have got nowhere without it.

#### Sparse Matrices

The matrices involved in collision calculations generally have only a few non-zero elements in each row, so using NxN storage space is very wasteful of memory and expensive to iterate through. It’s still important to store all the data contiguously. I store the inverse of each diagonal element as well to save a division in the solver.

#### Accelerating Gauss-Seidel

Gauss-Seidel is reliable, but it couldn’t be described as fast. It crawls to convergence in linear steps. It’s possible to improve on this by scaling the step taken on each iteration. But scale it too much and the convergence is worse, perhaps disastrously so. There’s no simple solution to finding the right scaling factor (the relaxation parameter). To be honest, I tried to follow the maths but gave up when the problem looked harder than solving the equation in the first place. In the end I settled on increasing the parameter slowly as long as the system seemed to be converging, and reducing it at the first sign of trouble. While perhaps not optimal, this was measurably faster than standard Gauss-Seidel.

#### Warm starting the solver

In a stable stack of objects the constraint forces don’t change much from one frame to the next. Caching the solution and using it a starting point for the next frame can dramatically cut the number of iterations required per frame.

#### Limiting memory allocations

I’m careful to avoid O(N^2) algorithms throughout, but when it comes to memory, you can’t even afford O(N).

For example, in my first attempt at caching the solver results, the overhead was bigger than the savings! I used std::unordered_map and it was allocating memory for every element. I wrote a basic hash table to use instead. With small elements it makes sense to store them in the table itself rather than allocating memory for each one as STL does.

#### Sleeping Objects

I’ve mentioned this before, but inactive objects can go to sleep and use zero processor time. When it comes to stacks, the entire stack has to sleep in one go or it will become unstable. In theory it should be possible to put a stack to sleep in stages starting at the bottom, but I haven’t quite worked that out.