Category: game development

An immediate mode GUI

I’ve been using an immediate mode GUI system for a while and I think the code is stable enough now to talk about my experience of it.

The definitive tutorial – though I deviated a lot from this:

I have so far made a level editor and a database using this system. I used my own GUI system because I knew I would have complex interactions between the GUI and the scene/document, and I didn’t fancy learning and possibly fighting an existing API. It was not for the final result, which would be better using proper Windows controls. Apart from having a non-standard look and feel I have no file selector, no copy/paste in text boxes and no tool-tips. But as the only person using it, I felt free to make that sacrifice.

Some techniques that I used:

  • Simple interface – For example, if (gui.DoButton(“MyButton”)). There are some optional input flags and that’s it.
  • No Widget IDs – I felt they were ugly, so I dispensed with them. Instead, I identify widgets by counting. This works as long as the GUI does not change while a widget is still active, and this has always been the case. The GUI changes in response to triggered widgets, and  widgets are deactivated when they are triggered. There can only be one active widget at a time, therefore there cannot be any active widgets after a trigger.
  • Automatic layout – It would not be practical to hard code the position of every button. I have vertical and horizontal lists and layout flags that can be passed to any widget. The size of the layouts is calculated and used in the following frame.
  • RAII – Layouts are created as objects on the stack. When they go out of scope, they remove themselves from the GUI layout stack.
  • Storing temporary data inside the GUI – For example the text box in the tutorial above will output values to the buffer while they are being edited. This complicates the code. My text box takes an initial string, then holds the temporary string internally while it is edited, returning the final string only when triggered.
  • Multiple trigger conditions –  How can you have a button that can distinguish left and right clicks when it only returns a bool? Place two buttons in the same spot, using a flag to prevent the bottom one being hidden. One is triggered on a left click, the other on a right click.
  • Building complex widgets out of basic ones – I have menus, lists, trees and tables. They are all buttons and labels underneath.
  • Drag boxes and Hot boxes – Drag boxes allow the user to click in a window and move it, triggering when it is dropped. Hot boxes report mouse clicks inside them. They allow painting tiles in the level and moving objects around. These are ridiculously simple to use.

I had some trouble organising the code on the user side until I realised what was going on. It’s not event-driven. The code has to be arranged the same way the interface is. That may sound inflexible – well, maybe – but it breaks down into windows, toolbars, menus and so on. Instead of events, I kept a queue of deferred actions. Want to load a file? Well, you need ask first whether to save the old one. Then request a file name. The actions might happen anywhere; it just depends where in the interface they are handled. Trust the action queue.

It’s very efficient in terms of the amount of code used, and the logic is easy to follow. My entire database interface is under 1000 lines of code, using a total of 120 widgets. And that does a lot of stuff: record editing, tables, a record tree, a schema builder and all the associated menus and dialog boxes. Now, if I could just figure out how to do tool tips…


Fast sin and cos functions

I was hunting around for fast sin/cos functions and couldn’t find anything suitably packaged and ready to go. The most useful I found was in this discussion:

Looks good, but it needed a bit of work. I decided to implement it with sse intrinsics and calculate sin and cos together to save even more processor time.

I ended up with this. Accurate to within 0.2% over +2 PI to -2 PI and about 3 times faster than standard sin/cos if you calculate both together.

__m128 Abs(__m128 m)
__m128 sign = _mm_castsi128_ps(_mm_set1_epi32(0x80000000));
return _mm_andnot_ps(sign, m);
__m128 Sin(__m128 m_x)
const float B = 4.f / PI;
const float C = -4.f / (PI * PI);
const float P = 0.225f;
//float y = B * x + C * x * abs(x);
//y = P * (y * abs(y) - y) + y;
__m128 m_pi = _mm_set1_ps(PI);
__m128 m_mpi = _mm_set1_ps(-PI);
__m128 m_2pi = _mm_set1_ps(PI * 2);
__m128 m_B = _mm_set1_ps(B);
__m128 m_C = _mm_set1_ps(C);
__m128 m_P = _mm_set1_ps(P);
__m128 m1 =_mm_cmpnlt_ps(m_x, m_pi);
m1 = _mm_and_ps(m1, m_2pi);
m_x = _mm_sub_ps(m_x, m1);
m1 =_mm_cmpngt_ps(m_x, m_mpi);
m1 = _mm_and_ps(m1, m_2pi);
m_x = _mm_add_ps(m_x, m1);
__m128 m_abs = Abs(m_x);
m1 = _mm_mul_ps(m_abs, m_C);
m1 = _mm_add_ps(m1, m_B);
__m128 m_y = _mm_mul_ps(m1, m_x);
m_abs = Abs(m_y);
m1 = _mm_mul_ps(m_abs, m_y);
m1 = _mm_sub_ps(m1, m_y);
m1 = _mm_mul_ps(m1, m_P);
m_y = _mm_add_ps(m1, m_y);
return m_y;
float Sin(float x)
__m128 m_x = _mm_set1_ps(x);
__m128 m_sin = Sin(m_x);
return _mm_cvtss_f32(m_sin);
float Cos(float x)
__m128 m_x = _mm_set1_ps(x + PI / 2.f);
__m128 m_cos = Sin(m_x);
return _mm_cvtss_f32(m_cos);
void SinCos(float x, float* s, float* c)
__m128 m_both = _mm_set_ps(0.f, 0.f, x + PI / 2.f, x);
__m128 m_sincos = Sin(m_both);
__m128 m_cos = _mm_shuffle_ps(m_sincos, m_sincos, _MM_SHUFFLE(0, 0, 0, 1));
*s = _mm_cvtss_f32(m_sincos);
*c = _mm_cvtss_f32(m_cos);

Fast collisions

Writing your own physics engine is no good unless the code runs fast enough. This is how I got 1000 boxes in a stack running at 60 fps.


Very Sleepy is a free sampling profiler, easy to install and simple to use. I used it to find the hotspots, and I would have got nowhere without it.

Sparse Matrices

The matrices involved in collision calculations generally have only a few non-zero elements in each row, so using NxN storage space is very wasteful of memory and expensive to iterate through. It’s still important to store all the data contiguously. I store the inverse of each diagonal element as well to save a division in the solver.

Accelerating Gauss-Seidel

Gauss-Seidel is reliable, but it couldn’t be described as fast. It crawls to convergence in linear steps. It’s possible to improve on this by scaling the step taken on each iteration. But scale it too much and the convergence is worse, perhaps disastrously so. There’s no simple solution to finding the right scaling factor (the relaxation parameter). To be honest, I tried to follow the maths but gave up when the problem looked harder than solving the equation in the first place. In the end I settled on increasing the parameter slowly as long as the system seemed to be converging, and reducing it at the first sign of trouble. While perhaps not optimal, this was measurably faster than standard Gauss-Seidel.

Warm starting the solver

In a stable stack of objects the constraint forces don’t change much from one frame to the next. Caching the solution and using it a starting point for the next frame can dramatically cut the number of iterations required per frame.

Limiting memory allocations

I’m careful to avoid O(N^2) algorithms throughout, but when it comes to memory, you can’t even afford O(N).

For example, in my first attempt at caching the solver results, the overhead was bigger than the savings! I used std::unordered_map and it was allocating memory for every element. I wrote a basic hash table to use instead. With small elements it makes sense to store them in the table itself rather than allocating memory for each one as STL does.

Sleeping Objects

I’ve mentioned this before, but inactive objects can go to sleep and use zero processor time. When it comes to stacks, the entire stack has to sleep in one go or it will become unstable. In theory it should be possible to put a stack to sleep in stages starting at the bottom, but I haven’t quite worked that out.

A physics upgrade

Naturally, once I had the fluid dynamics running I wanted to hook it up to the physics engine and have the explosions move objects around. It didn’t quite work at first because

  • The characters moved by setting their velocity directly, so the force from the blast didn’t affect them.
  • Other objects had no friction and would fly forever once set in motion.
  • Collisions only resolved overlapping bodies and didn’t affect velocity.

It turns out that resolving positions in a collision is pretty easy but resolving velocities is a tricky business. Well, I can’t resist this kind of thing so I had to have a go.

Suppose we have a large number of objects that need to be updated in a time step. In that time, some of them may collide, and the positions and velocities at the end of the time step should respect certain constraints, such as non-overlap and positive or zero relative velocity at every contact point. Also, basic laws of collisions such as conservation of momentum and the coefficient of restitution should apply. Numerical artefacts such as jitter should be minimised. And it needs to be fast.

I’ll start with a method for resolving positions, because, as I’ve said, it’s quite straightforward and leads on to part of the velocity solution. First, we need a list of potential collisions. Why not actual collisions? At this stage, it’s just not possible to tell. Object A may appear to miss object B, but then object C comes along and knocks B into the path of A. I collect collisions using a tolerance based on the maximum local velocity. Each potential collision provides a constraint equation, and these equations form a linear system, which can be solved, for example, using the Gauss-Seidel method. The output is a list of offsets which, when applied along the collision normals, will prevent intersection at the end of the time step. Although iterative, the solution can be calculated to any accuracy so there is little jitter.

Now, a related technique can resolve intersections by changing the velocity of each body so that it brings colliding bodies exactly into contact at the end of the time step. This works only for perfectly inelastic collisions, where the objects stick together. The post-collision velocity is used to correct intersection and therefore it can’t carry any bounce.

What if we resolve positions and velocities separately? Can there be bounce in that case? It turns out that no, it doesn’t work. You need to know the outgoing velocities at the start, but these depend on the global system. Get them wrong and energy is not conserved, and the bounce may look odd.

There is another method, though. The collisions can be processed one by one until none remain, that is, all the relative velocities are positive. This works (almost) correctly for perfectly elastic collisions, and not quite so well for inelastic collisions. Even slightly inelastic collisions dissipate too much energy over several iterations, and convergence becomes poor as the coefficient of restitution (Cr) approaches zero.

Interestingly, a combination of the two methods is perfectly fine. Calculate a result for Cr = 0 using a linear system and another for Cr = 1 using sequential impulses and they can be mixed together according to the desired value of Cr. A limitation of this technique is that all simultaneous collisions have the same Cr. But it’s the only one I know of that works! All others are a poor approximation. (As opposed to a better approximation – there is no exact solution.)

The only collision paper that makes sense:

A 600 box stack.

A 600 box stack.

Fluid dynamics part 2

Consider two cells of unit area, touching along a boundary of unit length. We have the average mass and momentum of each cell, and we want to know how these will change in a unit time step. (Using units based on the cell size and time step makes things simpler. Scaling factors can be folded into constants, saving some multiplications.)

The flow of mass is density times velocity. And since density times velocity is the momentum density, the flow in unit time is

mass flow = dot(momentum, flow direction)

The flow of momentum is only slightly more tricky. It’s the mass flow times the velocity, and velocity is momentum divided by density, so

momentum flow (parallel) = mass flow * mass flow / density
momentum flow (perpendicular) = dot(momentum, perpendicular) * mass flow / density

All these values are interpolated at the midpoint of the two cells.

Each cell has several neighbours, so updating the grid is a two pass process. First, calculate all the flows. Then, apply them to the cells. To update the grid,

source density -= mass flow
dest density += mass flow
source momentum -= momentum flow
dest momentum += momentum flow

Note that the same amount is added to one cell as is subtracted from the other, so both mass and momentum are conserved. The only problem comes if one of the densities becomes negative. There are various ways of dealing with this, but just making sure that no more mass can flow than is actually in the cell is the easiest.

This is all fine, but run it and nothing actually happens because all the initial momentums are zero. Trying to start an explosion by putting a huge density in one cell doesn’t do anything either. Why is that? If each cell is full of particles and the average momentum is zero, that doesn’t mean the momentum of every particle is zero. In fact, they will be flying in all directions and it just happens that it all averages out. If a cell has more particles than its neighbour, some particles will just randomly cross over, and fewer will come back. So there is a flow proportional to the difference in density between the cells.

mass flow += (source density - dest density) * diffusion constant

If there’s mass flow, then there’s momentum flow, from the above formula.

Once things get moving they aren’t necessarily going to stop. You can end up with a sort of lava lamp effect as the mass flows around. Applying some damping can sort this out. For each cell,

momentum *= damping factor

And draining the excess mass can be useful too,

density = density * drain factor + (1 - drain factor)

Also, some friction between cells helps to smooth out the flow.

momentum flow (perpendicular) += momentum difference * friction constant

What are the neighbours of a cell? Technically, only the four touching cells. But eight-way flow looks better. Hexagons would be an alternative, but I like the cells to line up with the walls.

Boundary conditions are important. A cell can be part of the active area, or it can be behind a wall. Nothing can flow through a wall, so the flow is zeroed if a cell or its neighbour are out of the active area. It is, however possible for momentum to bounce off a wall.

So the idea is simple, but there are several subtleties to the implementation. It helps to have an extra row of wall cells all the way around the grid, and the flow calculation stops one short of the edge. For rendering, I want the cell corners, so I resample the grid, averaging the four cells around each corner.

The speed is not bad. A full screen of cells at 8 pixel resolution runs at 60 fps. Still, I wanted it faster and the algorithm seems a perfect match for vectorization. So that’s the next post.

Fluid dynamics

After working on refactoring for  a few days I wanted something exciting. So I thought I would try to make explosions. I wanted something better than a sprite, because after all, I can’t draw, and also the idea of shockwaves bouncing off walls and travelling down corridors was appealing.

I had a look online and found plenty of demos, mostly showing smoke inside a box or something equally useless running on a GPU. There are tutorials, and they all start with lots of maths and are obsessed with stability, with no intuitive explanations. And they all want to simulate incompressible flow because compression makes the maths even harder. I don’t care about physical accuracy! I just want fireballs!

So I started to think about how you might simulate something like that. I decided that the system is essentially a huge number of particles, far too many to simulate individually, but it could be managed by averaging the properties of particles inside fixed regions or cells. Evolving the system is then a matter of determining how the average propertics in a cell would change. The simplest particles have two properties: mass and momentum. So the important properties of a cell are the total mass (or equivalently, the density) and the total momentum. To give at least some physical basis, these should be conserved throughout the system. Mass and momentum can flow between a cell and its neighbours. Make sure the flow is always plausible and it has to be stable (if it blew up, I knew I had a bug – a sign error usually).

The details call for another post, but for now, a picture.


Ask how to render game objects on a programming forum and you will probably be told something along the lines of ‘give each class a virtual Render() function and call it on every object from the main render loop’. This approach is popular with beginners because they know all about object-oriented programming and not much about rendering.

It falls apart very quickly. In general, a single pass over all objects is not enough, and the order is important. The interface can be made more complex but it has to be duplicated in every object. And after all that, 99% of the objects are rendered in exactly the same way; either they are a model or (in a 2D game) a sprite, so the abstraction is useless.

The next step is to build the objects out of components, so each object gets a render component and the rendering loops over those instead. This is better, but not much. There’s still no overall structure to the scene, which makes it hard to control and optimise.

And then, if you know a lot about object-oriented programming and something about rendering too, you end up with the scene graph.

There are lots of game engines built around scene graphs (I’m not linking to them – they know who they are). It’s an idea that comes from graphical editing tools. The structure of the scene is a tree, so objects can be grouped together, which is very useful when you are editing a scene. You can have various types of nodes which afford various properties to their sub-nodes, so a tree can perform a variety of tasks. And this is true in a game too. Unfortunately, because the tree is a very general data structure, it’s not especially good at anything.

Let’s look at what you might want a scene graph to do:

  • Logical grouping of game objects
  • Relative transformations
  • Culling and visibility
  • Level of detail (LOD)
  • State management
  • Collision detection

Note that a lot of that has nothing to do with rendering. Mixing unrelated data means more complex code and slower processing. It should be split off into the appropriate subsystem instead. The game logic system can group objects together. The physics or animation systems can constrain objects and stick them together. Collisions are better handled by a specialised data structure containing only collision information.

That leaves culling/visibility, render states and LODs. Where does any of that benefit from a tree structure? On a modern CPU, it’s quicker to cull objects in a flat list than traverse a tree, even if there are thousands of them. At most, you could go one level deep by specifying ranges in the list with an combined bounding box. The same goes for states. Have a flat list and sort it by effect (an effect is a combination of render state and shaders). For LODs, put them all in the list and cull the unwanted ones according to distance from the camera.

And does the game code care about this sort of organisation? Generally, no. Just chuck the object in there and let the renderer do its job. This makes the interface to the renderer very simple. You can still build levels and models using a tree. It just doesn’t serve much purpose at runtime. Flatten it in a preprocessing step or on loading.

In summary, a game is not a collection of objects, and it’s not a 3D editing package. Design the code around what actually needs to happen, and it ends up a lot simpler, and works better.



There’s very little to choose between Allegro and SDL. They have similar features and interfaces. I can’t remember why I chose SDL to begin with. Anyway, after a bit of research I decided to try Allegro to try to solve some rendering problems.

It only took a day to make the switch. I already used SDL mostly through a wrapper so I only had to make new versions of a few cpp files. Operation was in most cases virtually identical.

But they aren’t quite the same. SDL (at least the latest version that isn’t stuck in development hell) uses a software renderer. Allegro uses hardware rendering. Some basic features that are standard in hardware rendering are missing from the SDL renderer. If you want to use them, you have to go through OpenGL. Allegro provides access to these through a simple interface.

So that was my choice. Change all the rendering to OpenGL, or switch libraries to Allegro and stay with a 2D API.

On the non-technical side, Allegro seems to have more active development and more helpful forums. So maybe there is something to choose between them after all.

Visual Studio 2012

I installed it yesterday. It was a huge download which I had to run overnight on my creaking internet. Other than that installation was smooth, it automatically converted my project and everything worked straight away. It does break linking in VS2010 with no warning (nice one, Microsoft) unless you uninstall and reinstall some components and generally mess about. But I’m not going back so that doesn’t affect me.

There is an elephant in the room in the form of the UI. I know they like to change it in every release but what were they thinking? It’s been described as ‘fifty shades of grey’, but that’s a bit charitable as you basically have a choice of completely white or completely black. They seem to have dropped the old icons in favour of programmer art.

Why are we still using floppy disk icons in 2012?

I was able to download a theme that at least lets you see where the windows are.

Enough of that, how about new features? Well, they seem to want you to use test driven development because the first item in the right-click menu is ‘Run Tests’. Regardless of whether any tests actually exist or whether you have any intention of writing any. The next item is ‘Insert Snippet’. Look how useful snippets are; you can make an ‘if’ statement in no less than 20 mouse clicks. The actually useful item ‘Go to definition’ is buried somewhere in the middle. Guess I’d better get used to the keyboard shortcut.

C++11 is mostly there, enough to make the upgrade worthwhile. I don’t know why they are still lagging, but Microsoft has always had its own interpretation of the C++ standard.

Code Analysis is a nice feature, but seems to focus purely on potential crash bugs rather than any other errors. In my project it only complained about a buffer overrun (impossible due to the logic), a null pointer dereference (also impossible) and something about WinMain (hey, you wrote that!). Maybe I’m just that good, but I expected more.

That’s about it. The best feature is that I don’t have to pay for it, because the Express edition is enough for my needs.


I used my first ever bit of C++11 today. Adding support for different collision types meant storing a shape pointer on the physics body, so naturally I disabled copying of physics objects because there’s no need for it and I didn’t want to have a reference counted pointer. But then I couldn’t store them in a list anymore because the only way to get an object into a list is to copy it. C++11 has emplace_back, which allows an object to be constructed in place on the back of a list. I was so impressed at this fix to an awkward part of the language I decided to install Visual Studio 2012 to get the rest of it. I’ll try to use it for convenience and simplicity and not go crazy with advanced features.

The outcome of that is that I can place doors, which have box-shaped collisions. Unfortunately they get pushed down the corridors. They aren’t fixed in place yet.