The Renderloop Re-Revisted…

Ah, the good ol’ render loop.  Everyone’s favorite topic of conversation.  As I’m sure everyone is aware, the Managed DirectX samples that shipped with the DirectX9 SDK as well as the Summer 2003 update used the ‘dreaded’ DoEvents() loop I speak so negatively about at times.  People also probably have realized my book used the ‘infamous’ Paint/Invalidate method.  I never really made any recomendations in the earlier posts about which way was better, and really, I don’t plan on it now.  So why am I writing this now?!?!

If you read David’s post about the upcoming 2004 Update, you may have noticed that he mentions the DoEvents() methods that samples used to employ is gone.  In all reality, along with the new sample framework, the samples themselves actually never use the Windows Forms classes anymore either.  The actual render window and render loop are all run through P/Invoke calls into win32, and I figured I’d take a quick minute to explain the reasoning behind it.

Obviously the idea of using DirectX is for game development.  Sure, there are plenty of other non-game development scenarios that DirectX is great for (data visualization, medical imaging, etc), but what drives our API are the game developers.  If you know any game developers (or are one yourself), you’re probably vastly aware that while the game is running (and rendering), things need to happen quickly, and predictably.  With all the benefits of managed code, one thing that can be hard to achieve is that ‘predictability’, particularly when you’re dealing with the garbage collector.

So let’s say you decided to use Windows Forms for your rendering window, and you wanted to watch what the mouse was doing, so you hook the MouseMove event.  Aside from the ‘cost’ of the Invoke call to call into your handler, a managed object (the mouse event arguments) is created.  *Every* time.  Now, the garbage collector is quite efficient, and very speedy, so this alone could be easily handled.  The problem arises when your own ‘short lived’ objects get promoted to a new generation due to the extra collections these events are taking.  Generation 0 collections won’t have any effect on your game, generation 2 collections on the other hand will.

Thus the new sample framework doesn’t rely on these constructs at all.  This is probably one of the ‘most efficient’ rendering loop available in the managed space currently, but the code doesn’t necessarily follow many of the constructs you see in the managed world.  So, when deciding on the method you want to use to drive your rendering, you need to ask yourself what’s more important?  Performance, or conformance?  In the case of the updated sample framework, we’ve chosen performance.  Your situation may be different.

So you say that startup time is slow?

One of the first things that people may notice when they’re running managed applications is that the startup time is slower than a ‘native’ application.  Without delving into details, the major cause for this ‘slow down’ is the actual compilation that needs to happen (called JIT – Just in time compiling).  Since the managed code has to be compiled into native code before it’s executed, this is an expected delay when starting the application. Since Managed DirectX is built on top of this system, code you write using the Managed DirectX runtime will have this behavior as well.

Since the JIT compilation can do a lot of optimizations that just can’t be done during compile time (taking advantage of the actual machine the code is running on, rather than the machine it was compiled on, etc.) the behavior here is desired, the side effects (the slowdown) is not.  It would be great if there was a way to have this cost removed. Luckily for us, the .NET Framework includes a utility called NGen (Native Image Generator) which does exactly this.

This utility will natively compile an assembly and put the output into what is called the ‘Native Assembly Cache’, which resides within the ‘Global Assembly Cache’.  When the .NET Framework attempts to load an assembly, it will check to see if the native version of the assembly exists, and if so, load that instead of doing the JIT compilation at startup time, potentially dramatically decreasing the startup time of the application using these assemblies.   The downsides of using this utility are two-fold.  One, there’s no guarantee that the startup or execution time will be faster (although in most cases it will be – test to find out), and two the native image is very ‘fragile’.  There are a number of factors which cause the native images to be invalid (such as a new runtime installation, or security settings changes). Once the native image is invalid, it will still exist in the ‘Native Assembly Cache’, and never be used.  Plus, if you want to regain the benefits, you’ll need to ngen the assemblies once more, and unless you’re watching closely, you may not even notice that the original native assemblies are now invalid.

If you’ve decided you would still like to ngen your Managed DirectX assemblies, here are the steps you would take:

  • Open up a Visual Studio.NET 2003 Command Prompt Window
    • If you do not wish to open that command window, you could simply open up a normal command prompt window, and ensure the framework binary folder is in your path.  The framework binary folder should be located at %windir%microsoft.netframeworkv1.1.4322 where %windir% is your windows folder.
  • Change the directory to %windir%microsoft.netManaged DirectX, where %windir% is your windows folder.
  • Go into the folder for the version of Managed DirectX you wish to ngen from here (the later the version, the more recent the assembly).
  • Run the following command line for each of the assemblies in the folder:
    • ngen microsoft.directx.dll (etc)
    • If you’re command prompt supports it you may also use this command line instead:
      • for /R %i in (*.dll) do ngen %i

If you later decide you did not want the assemblies compiled natively, you can use the ngen /delete command to remove these now compiled assemblies.

  • Note that not all methods or types will be natively compiled by ngen, and these will still need to be JIT’d.  Any types or methods that fall into this category will be output during the running of the ngen executable.

Managed DirectX – Have you used it?

So I asked before what types of features you would like to see in Managed DirectX (and the feedback was awesome – I’m still interested in this topic)..  What I’m also interested in that I didn’t ask about back then though was what types of things people are using it for currently?

Are you using it to write some tools?  Game engines?  Playing around on the weekends?  What experiences have you had working with the API?

Post show ramblings after the Game Developers Conference…

All in all i think it was a really good show this year.  I met with lots of interesting people doing lots of interesting projects.  I also heard ramblings that the show next year might be in San Francisco rather than San Jose.  It was awfully crowded, so maybe a change of venue might be in order, but really i don’t know if it’s going to happen or not..

The expo itself was decently sized this year with lots of good booths.  Renderware had a large booth once again, but it seemed a little more enclosed this time, so i didn’t actually go in there.  Nokia’s N-Gage booth (which was huge and popular last year) was smaller and much less popular this year..  Part of that probably has to do with it’s sub-prime location this year compared to last year, part, but not all..  ATI and nVidia’s booths both had interesting presentations happening throughout the day, and the AMD64 booth was quite popular as well.  The Intel booth was huge as always, and they once again had the ‘contests’ where 6 people would play an online game for 5 minutes, and the winner would get the game (this year the game was ‘Call of Duty’)..  I played once and came in third place (i sucked), but did get a stuffed intel bunny-man doll.

My talk seemed to be received very well too.  I covered most of the basic areas for managed code in gaming, showed some demos, failed in showing other demos (doh!), and got some good questions..  One demo in particular really stood out for the crowd and i was asked many questions on that one after the talk and throughout the show.  It’ll be released in an upcoming DirectX SDK Update..

I loved the award shows Wednesday night, we announced XNA, and i think it was an all around great show.

To shader or not to shader, that is the question…

So i’m finishing up my second book (an introduction to 3d game development), which is intended to be a ‘beginners’ book, and i find myself continually arguing amongst myself about whether or not i should use shaders in the last ‘sample game’.  Couple this with the fact that my ‘advanced’ book which will be out a few short months after this beginner book is virtually entirely shader driven, with next to nothing using the fixed function pipeline.

The argument i’m having with myself is the potential that the shader code in the beginners book would be too difficult to be classified as ‘beginner’, while at the same time i don’t want to simply ‘ignore’ the shaders because they can be quite powerful.  Right now i’m leaning towards some basic shaders for the last game, just as a small ‘introduction’ that hopefully won’t catch anyone off guard.

I’d rather have someone complain about too much (or too difficult) information than not enough.

The speed of Managed DirectX

It seems that at least once a week i’m answering questions directly regarding the performance of managed code, and Managed DirectX in particular.  One of the more common questions i hear is some paraphrase of “Is it as fast as unmanaged code?”.

Obviously in a general sense it isn’t.  Regardless of the quality of the Managed DirectX API, the fact remains that it still has to run through the same DirectX API that the unmanaged code does.  There is naturally going to be a slight overhead for this, but does it have a large negative impact on the majority of applications?  Of course not.  No one is suggesting that one of the top of the line polygon pushing game coming out today (say, Half Life 2 or Doom 3) should be written in Managed DirectX, but that doesn’t mean that there isn’t a whole slew of games that could be.  I’ll get more to that later.

I’m also asked quite a bit things along the lines of “Why is it so slow?”  Sometimes the person hasn’t even ran a managed application, they just assume it has to be.  Other times, they may have run numerous various ‘scenarios’ comparing against the unmanaged code (including running the SDK samples) and have found that in some instances there is large differences.

Like I’ve mentioned earlier in this blog, all of the samples in the SDK use the dreaded ‘DoEvents’ loop, which can artificially slow down the application due to allocations and the subsequent large amounts of colllections.  The fact that most of the samples run with similar frame rates as the unmanaged API is a testament to the speed of the API to begin with.

The reality is that for many of the developers out there today, they simply don’t know how to write well performing managed code.  This isn’t through any shortcoming of the developer, but rather the newness of the API, combined with not enough documentation on performance, and how to get the best out of the CLR. Luckily, this is changing, for example, see Rico Mariani’s blog (or his old blog). For the most part, we are all newbies in this area, but things will only get better.

It’s not at all dissimilar to the change from assembler to C++ code for games.  It all comes down to a simple question.  Do the benefits outweigh the negatives?  Are you willing to sacrifice a small bit of performance for the easier development of managed code?  The quicker time to market?  The greater security?  The easier debugging?

Like i said earlier, there are certain games today that aren’t good fits for having the main engine written in managed code, but there are plenty of titles that are.  The top 10 selling PC games a few weeks ago included two versions of the Sims, Zoo Tycoon (+ expansion), Age of Mythology, Backyard Basketball 2004, Uru: Ages beyond myst, any of which could have been written in managed code.

Anyone who’s take the time to write some code in one of the managed languages normally realizes the benefits pretty quickly.

The Render Loop Revisited

Wow. I wouldn’t have thought that my blog on the render loop and doevents would spark as much discussion as it did. Invariably everyone wanted to know what i thought the ‘best’ way to do this was.

Actually, the answer is (naturally) ‘It Depends’. It wasn’t actually an oversight on my part to leave out a recommendation at the end of the post, it was done intentionally. I had hoped to spark peoples interest in learning the cost of the methods they were calling, and pointing out a common scenario where the method had side effects that many people weren’t aware of.

However, since I’ve been asked quite a few times on alternatives, I feel obligated to provide some. =)

Here are some alternatives, in no particular order.

  • Set your form to have all drawing occur in WmPaint, and do your rendering there. Before the end of the OnPaint method, make sure you do a this.Invalidate(); This will cause the OnPaint method to be fired again immediately.
  • P/Invoke into the Win32 API and call PeekMessage/TranslateMessage/DispatchMessage. (Doevents actually does something similar, but you can do this without the extra allocations).
  • Write your own forms class that is a small wrapper around CreateWindowEx, and give yourself complete control over the message loop.
  • Decide that the DoEvents method works fine for you and stick with it.

Each of these obviously have benefits and disadvantages over the others. Pick the one that best suits your needs.

The downside of using the events

Managed DirectX has plenty of events that it captures, and fires in a normal application. Every single managed graphics object will hook certain events on the device to ensure it can behave correctly.

For example, when a device is reset (due to a window resize, or a switch to or from full screen), any object stored in the default memory pool will need to be disposed. Many objects will also have something to do after the device has been reset. In the default case, each object that will need to be disposed before a reset will hook the DeviceLost event, while the items who also have post reset work will hook the DeviceReset event.

This doesn’t even consider the fact that each object hooks the device’s dispose event. In short, if events are being hooked, every d3d object will have a hook on the device. So why is this a problem?

Take this seemingly ‘safe’ code an example (assume swapChain is a valid Swap Chain, and device is a valid Direct3D device):

device.SetRenderTarget(0, swapChain.GetBackBuffer(0, BackBufferType.Mono));
device.DrawPrimitives(PrimitiveType.TriangleList, 0, 500);

Looks simple enough, just using the back buffer of the swap chain as the render target. However, if the events are being hooked, there are a lot more things going on here. The GetBackBuffer call will return a new surface representing the current back buffer of the swap chain. This object will hook the device lost and disposing events, which will be at least 3 allocations (the actual surface, along with the 2 event handlers).

Worst than that though, this object (which is only used for a brief period of time) will never be collected as long as the device is still alive since it has hooked events on the device, so this memory (and these objects) will never be reclaimed. They will eventually get promoted to generation 2, and memory usage on your application will just steadily rise. Imagine a game running at 60 frames per second, each frame calling this code..

To think, we haven’t hit the end of the problems yet either! Imagine the game running @ 60 frames per second that has been running for 2 minutes. Now, the devices dispose event has 7200 objects hooked, and the dispose method has just been caused since the application is shutting down. It takes a significant amount of time to propogate this event, and it will appear your application has locked up (when in reality it is simply notifying every object the device is now gone).

A much more efficient way of writing this code would be something like:

using (Surface buffer = swapChain.GetBackBuffer(0, BackBufferType.Mono))
{
device.SetRenderTarget(0, buffer);
device.DrawPrimitives(PrimitiveType.TriangleList, 0, 500);
}
In this scenario you get rid of the objects immediately. Yet you still have the underlying ‘problem’ of the event hooking.

An even better solution would be to turn off the event hooking within d3d completely. There is a static property on the device you can use to do this, such as:

Device.IsUsingEventHandlers = false;

If you do this before you create your device, this will completely turn off the internal event handling for d3d. Beware of doing this though, since you will need to manage the object lifetimes yourself.

The default behavior of device hooks is extremely convient, but if you want top performance, you may want to avoid the default event code. At the very least understand when and how the events are hooked and structure your code to be as fast as possible (such as the 2nd code snippet over the first).

The Render Loop

If you’ve seen the SDK samples for C# or VB.NET, and have looked much at the graphics samples you may have noticed the render loop, which looks something like this:

while(Created)
{
FullRender();
Application.DoEvents();
}

Obviously the point of this method is to loop so long as the window is created (which normally implies the application is still running). It looks harmless enough, and is relatively easy to understand, which is why we left it in the SDK, but in reality, this isn’t the best way you would want to drive your render loop.

Aside from the ‘ugliness’ factor of the loop, the real problem here is the Application.DoEvents method. It’s surprising that many people don’t actually realize what’s going on in this method, and i’d bet you would find even more people that were shocked that there’s actually an allocation happening in this method.

Let’s look at this simple application:

public class Form1 : System.Windows.Forms.Form
{
public Form1()
{
this.Show();
for (int i = 0; i < 50000; i++)
{
Application.DoEvents();
//System.Threading.Thread.Sleep(0);
}

this.Close();
}
static void Main()
{
try{Application.Run(new Form1());}
catch { }
}
}

All this does is show a form, call DoEvents 50,000 times, and exit. Nothing fancy at all. Running this through the CLR Profiler on my machine shows that during the lifetime of this application 6,207,082 bytes of data were allocated. Comment out the DoEvents and uncomment the Sleep call, and the application only allocates 406,822 bytes on my machine. Doing the math, that averages out to ~116 bytes allocated per call to DoEvents.

If you’re running a high frequency render loop, running at say, 1000 frames per second, that’s 100k of data you’re allocating per second. This will cause quicker gen0 collections, which can needlessly promote your own short lived objects to gen1, or worse.

Morale of the story. Just because it looks easy, doesn’t mean it’s the right idea.

Managed DirectX Graphics And Game Programming Kickstart

In a bit of a shameless plug, i’d like to point out that my book on Managed DirectX is available now.

As i’m sure everyone is already aware, the documentation for the Managed DirectX API left a little to be desired. My book covers every component of the Managed DirectX API, including DirectInput, DirectSound, DirectPlay, DirectDraw, but focuses the majority of the content on Direct3D.

While the other components are discussed briefly, the Direct3D API is discussed in much detail. The book takes you step by step from drawing a simple triangle on the screen all the way through character animation, and even the high level shader language.

There is no other greater source of information on Managed DirectX. With that, i’ll end my shameless plug. =)