Latest Entries »

An immediate mode GUI

I’ve been using an immediate mode GUI system for a while and I think the code is stable enough now to talk about my experience of it.

The definitive tutorial – though I deviated a lot from this:

http://iki.fi/sol/imgui/

I have so far made a level editor and a database using this system. I used my own GUI system because I knew I would have complex interactions between the GUI and the scene/document, and I didn’t fancy learning and possibly fighting an existing API. It was not for the final result, which would be better using proper Windows controls. Apart from having a non-standard look and feel I have no file selector, no copy/paste in text boxes and no tool-tips. But as the only person using it, I felt free to make that sacrifice.

Some techniques that I used:

  • Simple interface – For example, if (gui.DoButton(“MyButton”)). There are some optional input flags and that’s it.
  • No Widget IDs – I felt they were ugly, so I dispensed with them. Instead, I identify widgets by counting. This works as long as the GUI does not change while a widget is still active, and this has always been the case. The GUI changes in response to triggered widgets, and  widgets are deactivated when they are triggered. There can only be one active widget at a time, therefore there cannot be any active widgets after a trigger.
  • Automatic layout – It would not be practical to hard code the position of every button. I have vertical and horizontal lists and layout flags that can be passed to any widget. The size of the layouts is calculated and used in the following frame.
  • RAII – Layouts are created as objects on the stack. When they go out of scope, they remove themselves from the GUI layout stack.
  • Storing temporary data inside the GUI – For example the text box in the tutorial above will output values to the buffer while they are being edited. This complicates the code. My text box takes an initial string, then holds the temporary string internally while it is edited, returning the final string only when triggered.
  • Multiple trigger conditions –  How can you have a button that can distinguish left and right clicks when it only returns a bool? Place two buttons in the same spot, using a flag to prevent the bottom one being hidden. One is triggered on a left click, the other on a right click.
  • Building complex widgets out of basic ones – I have menus, lists, trees and tables. They are all buttons and labels underneath.
  • Drag boxes and Hot boxes – Drag boxes allow the user to click in a window and move it, triggering when it is dropped. Hot boxes report mouse clicks inside them. They allow painting tiles in the level and moving objects around. These are ridiculously simple to use.

I had some trouble organising the code on the user side until I realised what was going on. It’s not event-driven. The code has to be arranged the same way the interface is. That may sound inflexible – well, maybe – but it breaks down into windows, toolbars, menus and so on. Instead of events, I kept a queue of deferred actions. Want to load a file? Well, you need ask first whether to save the old one. Then request a file name. The actions might happen anywhere; it just depends where in the interface they are handled. Trust the action queue.

It’s very efficient in terms of the amount of code used, and the logic is easy to follow. My entire database interface is under 1000 lines of code, using a total of 120 widgets. And that does a lot of stuff: record editing, tables, a record tree, a schema builder and all the associated menus and dialog boxes. Now, if I could just figure out how to do tool tips…

Since I recently had to figure this out, and every other example on the internet is wrong, here’s how to erase from a vector using “swap and pop” without invalidating iterators or causing undefined behaviour.

if (!vec.empty())
{
  auto it = vec.begin();
  auto end = vec.end();
  while (it != end)
  {
    if (it->ShouldErase())
    {
      end--;
      *it = std::move(*end);
    }
    else
    {
      ++it;
    }
  }
  vec.erase(end, vec.end());
}

Debug builds

Since maintaining build configurations in Visual Studio is such a pain, I only have two. Unfortunately one of them had slowly become useless.

Day to day there are two configurations I need:

  • One for debugging, with maximum checks and optimisations off.
  • One for testing performance, with full optimisation and minimal checks.

I can live with the debug build having half the frame rate of the performance build. Beyond that the game becomes unplayable. It’s no good having all those checks if I can’t actually run the thing.

So I did something slightly unusual, and profiled the debug build. It turns out that having no optimisations was not the main problem; rather it was the debug iterators in the standard library.

This wasn’t entirely surprising. I used to have my own container classes for that reason, among others. But now I use the standard library all the time, because it is incredibly useful and too much work to replicate. It feels wrong to write C++ without it. But why are the debug iterators so slow? It seems that they lock mutexes on every operation, which seems unnecessary. Anyway, I can’t do anything about that, short of switching development environment. And I don’t want to turn it off, because it’s caught quite a few bugs already.

I found the worst cases and replaced iterators with pointers. I was using a vector as a dynamic vertex buffer. I wrapped it in a class and put in my own checks which don’t murder performance.

This got it back up to within a factor of two of the performance build. Hopefully it will stay there for a while.

 

Flamethrower

If you have aliens in a game, you have to have a flamethrower, even though they usually suck.

Re-using the particle system makes it easy. I’d like it to set fire to things though.

flamethrower

Deleting objects

Up to now, I had no way to delete objects from the physics engine. They could be deactivated, but they were still there. This was fine for dead enemies and other objects that existed in limited numbers. But I wanted to have lots of particle-like objects that exist for a short time, and each one couldn’t stay around forever.

Why was there no deletion? Because it’s not that easy. A physics body is allocated from a pool. A list stores all the allocated pointers. Then there is a std::vector of pointers to all active bodies, and a pointer to the body in its current cell, which may be awake, asleep or deactivated. The user code gets a pointer to the body too, and that’s what it will give back to be deleted.

So if something wants to delete a body, I get a pointer to the body to be removed, but it needs to be removed from a number of containers, which erase by iterator. And I can’t get any of those iterators without a linear search through every container. This looks like it might be slow. What I want is constant time deletion.

Right then. I can narrow down the containers that might have pointers to the body by waking it up. Then only the active cells, active list and the main list will have the pointer. There’s a point in the update when the active cells are empty and get rebuilt from the active list. That’s where I’ll put the deletion. Until then, the body is marked as pending delete.

I already have a loop through the active bodies. I’ll hijack that and check whether any bodies need to be deleted. If they do, I swap them to the end of the vector, pop_back and return the pointer to the pool. Still constant time. That just leaves the main list. I can’t delete from that without iterating through it. But it is never used until the physics world is shut down, and I don’t care so much about performance at that point.

So I change that list to a hash set of pointers. Now I can delete from it in constant time.

It may be crazy, but it works.

What’s wrong with rand()?

I’m not complaining that it isn’t random enough (though it often isn’t). And rand() is convenient, so maybe if you are writing a 10 line guess-the-number program it will be the quickest way to get there. But beyond that, I would never use it.

The problem is that rand() is lying. It says, ‘Hey, call this function and you get a random number, no strings attached!’. And that’s just not the case. Instead, rand() generates pseudorandom numbers that depend on some hidden global state, which can be modified by any part of the program at any time.

Suppose one system is happily generating random numbers and another system comes along and calls srand() with some inappropriate parameter. Suddenly the numbers are not random anymore. Or suppose you are relying on getting a predetermined sequence from a fixed seed. Again, a call to srand() from outside will break it.

Or maybe you are happy with the behaviour of rand(), then you compile the program on another platform and it’s not random enough anymore because they used a different algorithm.

As always with program state, the best way is to make it explicit. A random number generator has state, so make it a class. There are many types of generator and they have different properties, so make the algorithm explicit too. If it’s a Mersenne Twister call it that.

Then you know who has access to it, you know it is properly initialised, and you know it isn’t going to change from one platform to another. In short, it does what it says. Which rand() doesn’t.

 

Fast frustum culling

Almost every game needs to do frustum culling. There are many more objects in the game world than are visible at any one time, and the renderer should only be concerned with the ones that can be seen. There are many ways to do it. But the performance of such code is quite counter-intuitive. I decided to investigate.

The test case

I needed something extreme, so I chose a forest of 4 million trees, randomly positioned. In the end I wanted about 20000 visible, inside a 90 degree view cone. This is about how many separate objects a game could possibly render. The culling is done in 2D, but this is a roughly equivalent case to a landscape in 3D. I didn’t use any special optimisations for the culling routine; it’s standard floating point code returning whether the region is outside, partially inside or fully inside the frustum.

Brute force

Brute force culling of 4 million objects took 50 ms. A full frame is 16 ms, so this is clearly unacceptable. But it does show how fast the CPU can do these tests. Culling tests on the 20000 objects that were visible would only have taken 0.25 ms, if we hadn’t had to deal with all the rest.

Quad tree

I would say the standard structure for culling is a quad tree. I made one, optimised the depth, and the fastest I could make it go was 0.7 ms. That’s a huge improvement on brute force, but it only did around 8000 tests. Almost the entire time is overhead from cache misses and function calls.

Grid

Next I made a simple grid. Again, I optimised the cell size and made it as fast as I could. The result was 0.8 ms, only slightly slower than the quad tree. This time it did almost 40000 tests, but the overhead was much less, which accounts for the almost negligible difference compared to the tree.

Spatial indexing

Imagine a quad tree without any nodes except at the bottom level. The nodes are stored in the order you would see them if iterating through the tree depth first. It’s processed recursively, but without any pointers to follow it’s much more cache-friendly than a tree. This was the fastest method I found, at 0.5 ms.

Conclusions

  • However you do it, culling shouldn’t take very long. If the total object count is in the tens of thousands, consider brute force.
  • A tree structure gives almost no benefit compared to a grid.
  • Combining hierarchical culling with good cache behaviour is the fastest approach.

 

Pong, in a modern framework

Since I’ve been doing a lot a physics development and hacking the game to put in test cases, I thought I would write a little physics game. This would also provide a way to test and sanity check the whole framework.

(Oh, it failed the sanity check. But that’s all part of the process).

Pong is a very simple game, and it doesn’t need all the systems I put in place for the main game. But I wanted to use them anyway. In particular, I wanted to do Pong with real physics, simulating the bats and ball rather than following a minimal set of rules.

First problem: the bats shouldn’t move when the ball hits them. For this I had to add constraints, which didn’t exist in the physics engine before. However, I had contacts, and joints are mathematically similar. So I made two joints per bat, which could slide on the vertical axis but were fixed on the horizontal. Two, because the bats also should not rotate.

This has the side effect that hitting the top or bottom of a bat will move it, but I like that. I could have given the bats infinite mass, but I wanted them to be simulated.

There’s a little more to the game than the physics, but not much. I have three actor types:

  • Pong game
  • Bat
  • Ball

The Pong game actor creates the other actors, starts the game, and keeps score. The bat actor responds to input and moves the bat up and down. The ball actor waits for collisions and reports a score to the game.

Then there’s a bit of code to render the physics objects and the score, and that’s it. The game is entirely event driven because the physics runs in the background.

Next: take what I learned from Pong and apply it to the main game.pong

 

Fast sin and cos functions

I was hunting around for fast sin/cos functions and couldn’t find anything suitably packaged and ready to go. The most useful I found was in this discussion:

http://devmaster.net/forums/topic/4648-fast-and-accurate-sinecosine/

Looks good, but it needed a bit of work. I decided to implement it with sse intrinsics and calculate sin and cos together to save even more processor time.

I ended up with this. Accurate to within 0.2% over +2 PI to -2 PI and about 3 times faster than standard sin/cos if you calculate both together.


__m128 Abs(__m128 m)
{
__m128 sign = _mm_castsi128_ps(_mm_set1_epi32(0x80000000));
return _mm_andnot_ps(sign, m);
}
__m128 Sin(__m128 m_x)
{
const float B = 4.f / PI;
const float C = -4.f / (PI * PI);
const float P = 0.225f;
//float y = B * x + C * x * abs(x);
//y = P * (y * abs(y) - y) + y;
__m128 m_pi = _mm_set1_ps(PI);
__m128 m_mpi = _mm_set1_ps(-PI);
__m128 m_2pi = _mm_set1_ps(PI * 2);
__m128 m_B = _mm_set1_ps(B);
__m128 m_C = _mm_set1_ps(C);
__m128 m_P = _mm_set1_ps(P);
__m128 m1 =_mm_cmpnlt_ps(m_x, m_pi);
m1 = _mm_and_ps(m1, m_2pi);
m_x = _mm_sub_ps(m_x, m1);
m1 =_mm_cmpngt_ps(m_x, m_mpi);
m1 = _mm_and_ps(m1, m_2pi);
m_x = _mm_add_ps(m_x, m1);
__m128 m_abs = Abs(m_x);
m1 = _mm_mul_ps(m_abs, m_C);
m1 = _mm_add_ps(m1, m_B);
__m128 m_y = _mm_mul_ps(m1, m_x);
m_abs = Abs(m_y);
m1 = _mm_mul_ps(m_abs, m_y);
m1 = _mm_sub_ps(m1, m_y);
m1 = _mm_mul_ps(m1, m_P);
m_y = _mm_add_ps(m1, m_y);
return m_y;
}
float Sin(float x)
{
__m128 m_x = _mm_set1_ps(x);
__m128 m_sin = Sin(m_x);
return _mm_cvtss_f32(m_sin);
}
float Cos(float x)
{
__m128 m_x = _mm_set1_ps(x + PI / 2.f);
__m128 m_cos = Sin(m_x);
return _mm_cvtss_f32(m_cos);
}
void SinCos(float x, float* s, float* c)
{
__m128 m_both = _mm_set_ps(0.f, 0.f, x + PI / 2.f, x);
__m128 m_sincos = Sin(m_both);
__m128 m_cos = _mm_shuffle_ps(m_sincos, m_sincos, _MM_SHUFFLE(0, 0, 0, 1));
*s = _mm_cvtss_f32(m_sincos);
*c = _mm_cvtss_f32(m_cos);
}

As the next (maybe last) step in developing the physics engine, I wanted to add rotation to the objects. It should have been pretty easy, but ended up as a big overhaul in which I changed almost everything.

So what’s needed for rotation, on top of what I already had?

  • The rigid bodies get some extra state variables for angular velocity and rotation.
  • The integration has to update these variables.
  • Contacts need positions in order to work out the torque.
  • The solver matrix needs to include moment-of-inertia factors.

Not actually that much. The biggest task was adding rotation and position calculations to the collision functions. But then when I put it all in, the stability had gone straight to hell.

It actually wasn’t too bad for circles, or even circles with boxes. It was boxes against boxes that really had problems. I realised then that a single contact can’t keep a box stable, if it’s allowed to rotate. I changed it to two contacts along the contact edges. Much better, but still jittery.

The trouble with box collisions is that even tiny changes from frame to frame can dramatically alter the set of contacts. This reduces the effectiveness of contact caching, and that affects the convergence of the solver, and that means more jitter. A tidy stack is not too bad, but a big pile never quite settles.

I never solved this perfectly, but I did get it good enough with various techniques:

  • The collision functions have to be spot on. There’s no room for error.
  • Solving separately for positions and velocities prevents the intersection penalty feeding back into the next frame.
  • I treat low speed collisions as perfectly inelastic.
  • I don’t wake up objects for inelastic collisions. Instead I leave them asleep with infinite mass. This stops a jittery object waking up the whole stack.

I think there are still improvements to be had, but I have to stop somewhere. Maybe that’s it unless I get some new ideas.Rotating boxes