A technical post

Here’s a screenshot that won’t excite anybody but me.

Can you spot the exciting thing?  It’s not the cylinders (which should hopefully be proper trees by the end of the day, but are as of yet still just cylinders waiting to be shaped) — it’s actually the little green horizontal line at the bottom left corner of the screen.

You’ve seen screenshots with that little green (and sometimes red or blue) line before.  It only shows up in my debugging builds, not on the builds that I release, and it helps me gauge how quickly the game is currently running.

The two vertical blue bars are markers.  If the horizontal line doesn’t reach the first blue marker, then we’re running at 60fps.  If the horizontal line passes the first marker but doesn’t reach the second blue marker, then we’re running at 30fps.  Beyond that, the frame rate is really low.

The exciting thing is that if I had taken this screenshot on this computer yesterday, that horizontal green line would have stretched all the way over to the first blue marker — we’re drawing in half the time that it took before, in this same situation.

See, the VectorStorm library has always been phenomenally slow at drawing bitmapped text.  And when the context matrix (the box in the top right corner) is open, it really would slow things down to an extreme degree.  And unfortunately, this was difficult to solve because the real problem was caused by the VectorStorm library’s fundamental approach to drawing.

I first started writing the VectorStorm library when I was made a lead programmer at work.  I was panicking at being given that level of responsibility when there were so many areas of programming which I didn’t know about; my areas of expertise were cameras, player control, and networking.  I was, in industry parlance, a “games programmer”, as opposed to a physics programmer or a graphics programmer or an engine programmer, or other “smart-person-required” role.  My knowledge outside of these fields was strictly limited.

So I set myself a goal to write a simple game engine from scratch, with all the bells and whistles of the commercial engines which were written by the engine programmers at my workplace (or “the smart guys”, as I called them in my head), but which wasn’t necessarily hyper-tuned for speed the way that those professional ones were.   After all,  I was just a games programmer, right?

The real goal of writing VectorStorm was originally just to quickly learn enough not to embarrass myself at work.

At about this time, I was reading about old coin-op video games, and read that “Asteroids” was effectively a two-processor machine.  One processor read the control input and assembled a list of vector lines, which it passed to the second processor, and the second processor controlled the vector display — drawing those lines.  I thought that this was a really quirky approach, and so I decided that it’d be interesting (and somewhat silly) to use that same approach in my game engine.  This intriguing set-up was why VectorStorm rendered using vector graphics.  (The “Storm” part of the name is because I started by taking a lot of code from a previous game I’d written, “GemStorm”, which was a Bejewelled rip-off that I wrote for my own amusement, and never released publicly)

Anyhow, this is where the vsDisplayList came from — it’s the list of drawing commands that gets passed from one processor to another.  Inside VectorStorm, that “second processor” is actually one of the “vsRenderer” classes, which convert from the drawing instructions in the vsDisplayList into OpenGL rendering commands.

I thought this was amusing and a bit Rube Goldberg-esque.  After all, the example and tutorial code you see on the web just has people calling straight into OpenGL from game code, or else from within drawing functions exposed by the game engine.  The whole concept of pushing an opcode-based set of drawing instructions into a data buffer, only to have a different piece of code pull those instructions out again and call into OpenGL just seemed somewhat farcical.

(Side-note:  Only about two months ago, I was talking with a graphics programmer co-worker and learned that three of the four professional game engines I’ve worked with over the past decade have had fundamentally this same architecture; assembling drawing commands into a data buffer which gets passed across to something else which pulls them out again and actually performs the drawing.  Colour me shocked.)

Anyway.  When it came time to extend VectorStorm into 3D and handle larger datasets, it started becoming very expensive to push lots of data into the display list.  When you have thousands of vertices, you really don’t want to be spending computer time writing and reading them into that vsDisplayList.  Instead, you want to be able to simply tell the display list where those vertices can be found.

So I implemented OpenGL VBOs (this was at about the time I was writing Lair).   VBOs are a way to store data directly on the video card, and then refer to it by an ID number, later.  Once I’d done this, I could just have the display list say “Render the stuff I’ve already put on the card”.  Executing that command is heaps faster than pulling an array of data in/out of the data buffer and setting up OpenGL to render from it.  So all was good, right?

Well, no.  There was a big problem.  The problem was this:

The problem was that vsDisplayLists don’t understand that they’re referring to external data, and so when you destroy a display list, it can’t clean up that external data.

To recap:  To get the best performance, vsDisplayLists needed to stop containing the actual data to be drawn, and instead just to refer to data that’s stored elsewhere.  By doing this, it implies that you need a “somewhere else” to store that data, which can handle cleaning up after itself.

Unfortunately, there is no standard “somewhere else” for that data to be stored, especially if you were loading display lists raw from disk, using vsDisplayList::Load(“MyFile”).  There was nowhere to put the data, so it just got placed directly into the vsDisplayList for lack of anywhere else to put it.  Which meant that you didn’t get the performance benefits of the VBOs.

Sure, special classes could create their own VBOs and make their own display lists which used them, but that required special code for every class that was going to do it.  (As a result, virtually every single renderable thing in Lair and in MMORPG Tycoon 2 has custom code handling its rendering)

See, VectorStorm uses the vsDisplayList as the fundamental drawing primitive everywhere;  everything, everything creates and returns and uses vsDisplayLists.  Which means that if something creates something to be rendered, it has nowhere to actually put the data that it generates, except for directly into the vsDisplayList.

While I was making Lair, I created the vsFont class, for creating vsDisplayLists which would render bitmapped fonts.  It’s an awesome example of this problem;  since it’s just returning a vsDisplayList, it couldn’t put data onto the video card and embed references to it in the vsDisplayList, as there’d be no way to clean up that data on the card once the string wasn’t being used any more.  Instead, the vsFont just wrote the rendering data straight into the vsDisplayList.  This meant that the vsDisplayLists for rendering strings used an awful lot of memory, and so also took a long time to render.

Several months back, when I was first implementing MMORPG Tycoon 2’s quest editing UI, I finally got frustrated enough to try to fix the problem;  even short strings were taking hundreds of kilobytes to store for rendering, which was far more than was sensible.

And I found an improvement.  This improvement was to have each vsFont create a set of data on the video card for each glyph in the font.  Then, instead of putting all the vertex data directly into the display list, it only needed to put “Move here”, “Get ready to draw the first glyph”, “Draw it”, “Move there”, “Get ready to draw the second glyph”, “Draw it”, etc. commands.  And the vsFont itself could clean up all that glyph data on the card, when the font was no longer needed.  Suddenly, strings which used to require 300kb to render, now only required 40kb.  This was a massive memory saving!  And this is the font rendering that was used in the MS1 build.

But really, that was just finding a clever hack around the fundamental problem, that for best performance, vsDisplayList needed to refer to data, not to own it.  But that for general game-use, when I had utilities creating vsDisplayLists, I needed a generic place to store the data which the vsDisplayLists were referring to, and a way to clean up that data once it was no longer needed.

But now I’ve finally addressed the issue for real.

I’ve created a class called the ‘vsFragment’ (I don’t know if that name will stick.  It’s not a good name.  But it’s what I’m using right now).  A ‘vsFragment’ contains a material, some rendering data (for example, references to vertices that are stored on the video card), and a display list.  When it’s destroyed, the vsFragment automatically cleans up all of that data which it had been using.  This basically means that game code can now use vsFragments the way that it used to use vsDisplayLists, and the vsFragments can now safely assume that they own the resources they had been using.

I’ve added support for putting any number of vsFragments onto vsSprites or vsModels, in addition to the raw vsDisplayList they already have.  At some point in the future, I expect that raw vsDisplayList is going to go away, and vsFragments will become the only way to render.  I haven’t quite figured out the rendering structure that’ll be in use then, but there’s no real rush;  things work just fine while supporting both the new and the old system simultaneously.

So with this new vsFragment class, I’ve now modified vsFont such that instead of returning a vsDisplayList, it can return a vsFragment.  This means that the vsFont can create storage for the string rendering data on the video card, and assign ownership of that data to the vsFragment.  By doing this, every string, no matter how long it is, can now be rendered by a vsDisplayList that’s just 28 bytes long, and which contains only five drawing instructions in total, since absolutely all of the rendering data has been loaded onto the video card.  I’ve modified most of the UI to now use the vsFragments, and it’s now drawing much faster than it did this morning.

In MMORPG Tycoon 2, interpreting all of those text drawing commands in the display lists would often take as much time as drawing the entire rest of the world.  But now, it’s fine;  drawing text doesn’t seem to affect the frame rate at all, any longer.

There’s still a lot of work ahead, of course, converting things over to using vsFragments instead of using vsDisplayLists directly, but this should yield much better performance overall, and also should solve a number of future renderer and performance issues.  And I’m not going to do it all at once;  the conversion should be a very slow, gradual process over the coming months.