Archive for January, 2013

What’s wrong with rand()?

I’m not complaining that it isn’t random enough (though it often isn’t). And rand() is convenient, so maybe if you are writing a 10 line guess-the-number program it will be the quickest way to get there. But beyond that, I would never use it.

The problem is that rand() is lying. It says, ‘Hey, call this function and you get a random number, no strings attached!’. And that’s just not the case. Instead, rand() generates pseudorandom numbers that depend on some hidden global state, which can be modified by any part of the program at any time.

Suppose one system is happily generating random numbers and another system comes along and calls srand() with some inappropriate parameter. Suddenly the numbers are not random anymore. Or suppose you are relying on getting a predetermined sequence from a fixed seed. Again, a call to srand() from outside will break it.

Or maybe you are happy with the behaviour of rand(), then you compile the program on another platform and it’s not random enough anymore because they used a different algorithm.

As always with program state, the best way is to make it explicit. A random number generator has state, so make it a class. There are many types of generator and they have different properties, so make the algorithm explicit too. If it’s a Mersenne Twister call it that.

Then you know who has access to it, you know it is properly initialised, and you know it isn’t going to change from one platform to another. In short, it does what it says. Which rand() doesn’t.


Fast frustum culling

Almost every game needs to do frustum culling. There are many more objects in the game world than are visible at any one time, and the renderer should only be concerned with the ones that can be seen. There are many ways to do it. But the performance of such code is quite counter-intuitive. I decided to investigate.

The test case

I needed something extreme, so I chose a forest of 4 million trees, randomly positioned. In the end I wanted about 20000 visible, inside a 90 degree view cone. This is about how many separate objects a game could possibly render. The culling is done in 2D, but this is a roughly equivalent case to a landscape in 3D. I didn’t use any special optimisations for the culling routine; it’s standard floating point code returning whether the region is outside, partially inside or fully inside the frustum.

Brute force

Brute force culling of 4 million objects took 50 ms. A full frame is 16 ms, so this is clearly unacceptable. But it does show how fast the CPU can do these tests. Culling tests on the 20000 objects that were visible would only have taken 0.25 ms, if we hadn’t had to deal with all the rest.

Quad tree

I would say the standard structure for culling is a quad tree. I made one, optimised the depth, and the fastest I could make it go was 0.7 ms. That’s a huge improvement on brute force, but it only did around 8000 tests. Almost the entire time is overhead from cache misses and function calls.


Next I made a simple grid. Again, I optimised the cell size and made it as fast as I could. The result was 0.8 ms, only slightly slower than the quad tree. This time it did almost 40000 tests, but the overhead was much less, which accounts for the almost negligible difference compared to the tree.

Spatial indexing

Imagine a quad tree without any nodes except at the bottom level. The nodes are stored in the order you would see them if iterating through the tree depth first. It’s processed recursively, but without any pointers to follow it’s much more cache-friendly than a tree. This was the fastest method I found, at 0.5 ms.


  • However you do it, culling shouldn’t take very long. If the total object count is in the tens of thousands, consider brute force.
  • A tree structure gives almost no benefit compared to a grid.
  • Combining hierarchical culling with good cache behaviour is the fastest approach.


Pong, in a modern framework

Since I’ve been doing a lot a physics development and hacking the game to put in test cases, I thought I would write a little physics game. This would also provide a way to test and sanity check the whole framework.

(Oh, it failed the sanity check. But that’s all part of the process).

Pong is a very simple game, and it doesn’t need all the systems I put in place for the main game. But I wanted to use them anyway. In particular, I wanted to do Pong with real physics, simulating the bats and ball rather than following a minimal set of rules.

First problem: the bats shouldn’t move when the ball hits them. For this I had to add constraints, which didn’t exist in the physics engine before. However, I had contacts, and joints are mathematically similar. So I made two joints per bat, which could slide on the vertical axis but were fixed on the horizontal. Two, because the bats also should not rotate.

This has the side effect that hitting the top or bottom of a bat will move it, but I like that. I could have given the bats infinite mass, but I wanted them to be simulated.

There’s a little more to the game than the physics, but not much. I have three actor types:

  • Pong game
  • Bat
  • Ball

The Pong game actor creates the other actors, starts the game, and keeps score. The bat actor responds to input and moves the bat up and down. The ball actor waits for collisions and reports a score to the game.

Then there’s a bit of code to render the physics objects and the score, and that’s it. The game is entirely event driven because the physics runs in the background.

Next: take what I learned from Pong and apply it to the main game.pong


Fast sin and cos functions

I was hunting around for fast sin/cos functions and couldn’t find anything suitably packaged and ready to go. The most useful I found was in this discussion:

Looks good, but it needed a bit of work. I decided to implement it with sse intrinsics and calculate sin and cos together to save even more processor time.

I ended up with this. Accurate to within 0.2% over +2 PI to -2 PI and about 3 times faster than standard sin/cos if you calculate both together.

__m128 Abs(__m128 m)
__m128 sign = _mm_castsi128_ps(_mm_set1_epi32(0x80000000));
return _mm_andnot_ps(sign, m);
__m128 Sin(__m128 m_x)
const float B = 4.f / PI;
const float C = -4.f / (PI * PI);
const float P = 0.225f;
//float y = B * x + C * x * abs(x);
//y = P * (y * abs(y) - y) + y;
__m128 m_pi = _mm_set1_ps(PI);
__m128 m_mpi = _mm_set1_ps(-PI);
__m128 m_2pi = _mm_set1_ps(PI * 2);
__m128 m_B = _mm_set1_ps(B);
__m128 m_C = _mm_set1_ps(C);
__m128 m_P = _mm_set1_ps(P);
__m128 m1 =_mm_cmpnlt_ps(m_x, m_pi);
m1 = _mm_and_ps(m1, m_2pi);
m_x = _mm_sub_ps(m_x, m1);
m1 =_mm_cmpngt_ps(m_x, m_mpi);
m1 = _mm_and_ps(m1, m_2pi);
m_x = _mm_add_ps(m_x, m1);
__m128 m_abs = Abs(m_x);
m1 = _mm_mul_ps(m_abs, m_C);
m1 = _mm_add_ps(m1, m_B);
__m128 m_y = _mm_mul_ps(m1, m_x);
m_abs = Abs(m_y);
m1 = _mm_mul_ps(m_abs, m_y);
m1 = _mm_sub_ps(m1, m_y);
m1 = _mm_mul_ps(m1, m_P);
m_y = _mm_add_ps(m1, m_y);
return m_y;
float Sin(float x)
__m128 m_x = _mm_set1_ps(x);
__m128 m_sin = Sin(m_x);
return _mm_cvtss_f32(m_sin);
float Cos(float x)
__m128 m_x = _mm_set1_ps(x + PI / 2.f);
__m128 m_cos = Sin(m_x);
return _mm_cvtss_f32(m_cos);
void SinCos(float x, float* s, float* c)
__m128 m_both = _mm_set_ps(0.f, 0.f, x + PI / 2.f, x);
__m128 m_sincos = Sin(m_both);
__m128 m_cos = _mm_shuffle_ps(m_sincos, m_sincos, _MM_SHUFFLE(0, 0, 0, 1));
*s = _mm_cvtss_f32(m_sincos);
*c = _mm_cvtss_f32(m_cos);

As the next (maybe last) step in developing the physics engine, I wanted to add rotation to the objects. It should have been pretty easy, but ended up as a big overhaul in which I changed almost everything.

So what’s needed for rotation, on top of what I already had?

  • The rigid bodies get some extra state variables for angular velocity and rotation.
  • The integration has to update these variables.
  • Contacts need positions in order to work out the torque.
  • The solver matrix needs to include moment-of-inertia factors.

Not actually that much. The biggest task was adding rotation and position calculations to the collision functions. But then when I put it all in, the stability had gone straight to hell.

It actually wasn’t too bad for circles, or even circles with boxes. It was boxes against boxes that really had problems. I realised then that a single contact can’t keep a box stable, if it’s allowed to rotate. I changed it to two contacts along the contact edges. Much better, but still jittery.

The trouble with box collisions is that even tiny changes from frame to frame can dramatically alter the set of contacts. This reduces the effectiveness of contact caching, and that affects the convergence of the solver, and that means more jitter. A tidy stack is not too bad, but a big pile never quite settles.

I never solved this perfectly, but I did get it good enough with various techniques:

  • The collision functions have to be spot on. There’s no room for error.
  • Solving separately for positions and velocities prevents the intersection penalty feeding back into the next frame.
  • I treat low speed collisions as perfectly inelastic.
  • I don’t wake up objects for inelastic collisions. Instead I leave them asleep with infinite mass. This stops a jittery object waking up the whole stack.

I think there are still improvements to be had, but I have to stop somewhere. Maybe that’s it unless I get some new ideas.Rotating boxes