Managed DirectX – Random Musings of Miller

Why are these textures taking up so much memory..

Someone on the newsgroups was recently asking why loading his simple 800×600 texture was taking up so much memory (more than 5 megs). He was using code something like the following:

Texture myTexture = TextureLoader.FromFile(myDevice, somePath);

The quest seemed interesting enough for me to post my response here as well as in the newsgroup since this reinforces my belief that you should always know what the methods you’re calling ‘cost’. It wouldn’t surprise me to find out that most people don’t realize all the things that go in with that simple line above.

So here was my response:

First, if you’re texture is really 800×600, using this overload (TextureLoader.FromFile) will ‘upscale’ the texture so that it is square and a power of two, in this case, 1024×1024.. Assuming 32bit color for each pixel, the default mip level would be:

1024x1024x4 = 4 megs of pixel data.

Now, this overload would also create a mipmap chain down to 1×1, so you would also have:

512x512x4 = 1 meg of pixel data

256x256x4 = 256k of pixel data

128x128x4 = 64k of pixel data

64x64x4 = 16k of pixel data

32x32x4 = 4k of pixel data

16x16x4 = 1k of pixel data

8x8x4 = 256 bytes of pixel data

4x4x4 = 64 bytes of pixel data

2x2x4 = 16 bytes of pixel data

1x1x4 = 4 bytes of pixel data

So, add it all up, and each texture you load would take (on average) 5.6 megs of pixel data.. So the numbers make sense to me..

Now that the Whidbey Beta is out, what about Managed DirectX?

With the release of the Whidbey (err, I mean Visual Studio.NET 2005) beta a few weeks ago, my thoughts have drifted towards how Managed DirectX may work in this new release. For those of you who haven’t seen VS2005, or the new CLR, the changes there are magnificent. Across the board, improvements have been made, and the thing is looking wonderful.

So my thoughts roll over to MDX, and I think to myself, “Self: wouldn’t it be awesome if we took advantage of some of those features?” Who wouldn’t want to declare the vertex data they’re about to use like:

VertexBuffer vb = null;
IndexBuffer ib = null;

Of course that just scratches the surface of the possibilities that would lie in a VS2005 specific version of Managed DirectX. Which begs the question, what do you think?

The Renderloop Re-Revisted…

Ah, the good ol’ render loop. Everyone’s favorite topic of conversation. As I’m sure everyone is aware, the Managed DirectX samples that shipped with the DirectX9 SDK as well as the Summer 2003 update used the ‘dreaded’ DoEvents() loop I speak so negatively about at times. People also probably have realized my book used the ‘infamous’ Paint/Invalidate method. I never really made any recomendations in the earlier posts about which way was better, and really, I don’t plan on it now. So why am I writing this now?!?!

If you read David’s post about the upcoming 2004 Update, you may have noticed that he mentions the DoEvents() methods that samples used to employ is gone. In all reality, along with the new sample framework, the samples themselves actually never use the Windows Forms classes anymore either. The actual render window and render loop are all run through P/Invoke calls into win32, and I figured I’d take a quick minute to explain the reasoning behind it.

Obviously the idea of using DirectX is for game development. Sure, there are plenty of other non-game development scenarios that DirectX is great for (data visualization, medical imaging, etc), but what drives our API are the game developers. If you know any game developers (or are one yourself), you’re probably vastly aware that while the game is running (and rendering), things need to happen quickly, and predictably. With all the benefits of managed code, one thing that can be hard to achieve is that ‘predictability’, particularly when you’re dealing with the garbage collector.

So let’s say you decided to use Windows Forms for your rendering window, and you wanted to watch what the mouse was doing, so you hook the MouseMove event. Aside from the ‘cost’ of the Invoke call to call into your handler, a managed object (the mouse event arguments) is created. *Every* time. Now, the garbage collector is quite efficient, and very speedy, so this alone could be easily handled. The problem arises when your own ‘short lived’ objects get promoted to a new generation due to the extra collections these events are taking. Generation 0 collections won’t have any effect on your game, generation 2 collections on the other hand will.

Thus the new sample framework doesn’t rely on these constructs at all. This is probably one of the ‘most efficient’ rendering loop available in the managed space currently, but the code doesn’t necessarily follow many of the constructs you see in the managed world. So, when deciding on the method you want to use to drive your rendering, you need to ask yourself what’s more important? Performance, or conformance? In the case of the updated sample framework, we’ve chosen performance. Your situation may be different.

Direct3D and the FPU..

I had an email this morning about Managed Direct3D ‘breaking’ the math functions in the CLR. The person who wrote discovered that this method:

public void AssertMath()
{
double dMin = 0.54797677334988781;
double dMax = 4.61816551621179;
double dScale = 1/(dMax – dMin);
double dNewMax = 1/dScale + dMin;
System.Diagnostics.Debug.Assert(
dMax == dNewMax);
}

Behaved differently depending on whether or not a Direct3D device had been created. It worked before the device was created, and failed afterwords. Naturally, he assumed this was a bug, and was concerned. Since i’ve had to answer questions similar to this multiple times now, well that pretty much assures it needs it’s own blog entry.

The short of it is this is caused by the floating point unit (FPU). When a Direct3D device is created, the runtime will change the FPU to suit its needs (by default switch to single precision, the default for the CLR is double precision). This is done because it has better performance than double precision (naturally).

Now, the code above works before the device is created because the CLR is running in double precision. Then you create a Direct3D device, the FPU is switched to single precision, and there are no longer enough digits of precision to accurately calculate the above code. Thus the ‘failure’.

Luckily, you can avoid all of this by simply telling Direct3D not to mess with the FPU at all. When creating the device you should use the CreateFlags.FpuPreserve flag to keep the CLR’s double precision, and have your code functioning as you expect it.

So you say that startup time is slow?

One of the first things that people may notice when they’re running managed applications is that the startup time is slower than a ‘native’ application. Without delving into details, the major cause for this ‘slow down’ is the actual compilation that needs to happen (called JIT – Just in time compiling). Since the managed code has to be compiled into native code before it’s executed, this is an expected delay when starting the application. Since Managed DirectX is built on top of this system, code you write using the Managed DirectX runtime will have this behavior as well.

Since the JIT compilation can do a lot of optimizations that just can’t be done during compile time (taking advantage of the actual machine the code is running on, rather than the machine it was compiled on, etc.) the behavior here is desired, the side effects (the slowdown) is not. It would be great if there was a way to have this cost removed. Luckily for us, the .NET Framework includes a utility called NGen (Native Image Generator) which does exactly this.

This utility will natively compile an assembly and put the output into what is called the ‘Native Assembly Cache’, which resides within the ‘Global Assembly Cache’. When the .NET Framework attempts to load an assembly, it will check to see if the native version of the assembly exists, and if so, load that instead of doing the JIT compilation at startup time, potentially dramatically decreasing the startup time of the application using these assemblies. The downsides of using this utility are two-fold. One, there’s no guarantee that the startup or execution time will be faster (although in most cases it will be – test to find out), and two the native image is very ‘fragile’. There are a number of factors which cause the native images to be invalid (such as a new runtime installation, or security settings changes). Once the native image is invalid, it will still exist in the ‘Native Assembly Cache’, and never be used. Plus, if you want to regain the benefits, you’ll need to ngen the assemblies once more, and unless you’re watching closely, you may not even notice that the original native assemblies are now invalid.

If you’ve decided you would still like to ngen your Managed DirectX assemblies, here are the steps you would take:

Open up a Visual Studio.NET 2003 Command Prompt Window
- If you do not wish to open that command window, you could simply open up a normal command prompt window, and ensure the framework binary folder is in your path. The framework binary folder should be located at %windir%microsoft.netframeworkv1.1.4322 where %windir% is your windows folder.
Change the directory to %windir%microsoft.netManaged DirectX, where %windir% is your windows folder.
Go into the folder for the version of Managed DirectX you wish to ngen from here (the later the version, the more recent the assembly).
Run the following command line for each of the assemblies in the folder:
- ngen microsoft.directx.dll (etc)
- If you’re command prompt supports it you may also use this command line instead:
  - for /R %i in (*.dll) do ngen %i

If you later decide you did not want the assemblies compiled natively, you can use the ngen /delete command to remove these now compiled assemblies.

Note that not all methods or types will be natively compiled by ngen, and these will still need to be JIT’d. Any types or methods that fall into this category will be output during the running of the ngen executable.

Managed DirectX – Have you used it?

So I asked before what types of features you would like to see in Managed DirectX (and the feedback was awesome – I’m still interested in this topic).. What I’m also interested in that I didn’t ask about back then though was what types of things people are using it for currently?

Are you using it to write some tools? Game engines? Playing around on the weekends? What experiences have you had working with the API?

Wow, the feedback has been awesome..

I love it when you see people excited and they are giving the feedback, and the feedback that’s been coming in has been great. A consistent theme among the feedback has been the doc’s (isn’t it always?).

Rest assured this is an area we are working on. You probably noticed an improvement in the Summer 2003 SDK Update that was released last year, and those improvements are continuing today. The next release will have even more and better doc’s. Another common point brought up was the samples that ship with the DirectX SDK, which is also something that is being addressed.

The idea of ‘community’ intrigues me. There is a small ‘community’ site on GotDotNet but i’m not sure of the traffic it gets.

I’ve noticed a few bug reports interspersed within the comments as well. If you have a bug, please email it to directx@microsoft.com so we can make sure it gets addressed in a timely fasion.

If there are any other specific items you are seeing that is either stopping your adoption of Managed DirectX, or is enhancing your use of it, this is something that I would love to hear about. Keep the feedback coming, and thanks!

What would stop you from using Managed DirectX?

This is a question that is interesting in more ways than one. One of the more common answers to this question i hear is naturally centered around performance, even though many times the person with the ‘fear’ of the performance hasn’t actually tried to see what types of performance they could get? I would love to hear about specific areas where people have found the performance to be lacking, and the goals they’re trying to accomplish when hitting these ‘barriers’.

But above and beyond that, what other reasons would you have for not using Managed DirectX. Do you think the working set is too high? Do you not like the API design? Do you just wish that feature ‘XYZ’ was supported, or supported in a different way?

At the same time, what about the users who are using Managed DirectX currently. What do you like, and why?

You can consider this my highly unscientific survey on the current state of the Managed DirectX runtime. =)

The speed of Managed DirectX

It seems that at least once a week i’m answering questions directly regarding the performance of managed code, and Managed DirectX in particular. One of the more common questions i hear is some paraphrase of “Is it as fast as unmanaged code?”.

Obviously in a general sense it isn’t. Regardless of the quality of the Managed DirectX API, the fact remains that it still has to run through the same DirectX API that the unmanaged code does. There is naturally going to be a slight overhead for this, but does it have a large negative impact on the majority of applications? Of course not. No one is suggesting that one of the top of the line polygon pushing game coming out today (say, Half Life 2 or Doom 3) should be written in Managed DirectX, but that doesn’t mean that there isn’t a whole slew of games that could be. I’ll get more to that later.

I’m also asked quite a bit things along the lines of “Why is it so slow?” Sometimes the person hasn’t even ran a managed application, they just assume it has to be. Other times, they may have run numerous various ‘scenarios’ comparing against the unmanaged code (including running the SDK samples) and have found that in some instances there is large differences.

Like I’ve mentioned earlier in this blog, all of the samples in the SDK use the dreaded ‘DoEvents’ loop, which can artificially slow down the application due to allocations and the subsequent large amounts of colllections. The fact that most of the samples run with similar frame rates as the unmanaged API is a testament to the speed of the API to begin with.

The reality is that for many of the developers out there today, they simply don’t know how to write well performing managed code. This isn’t through any shortcoming of the developer, but rather the newness of the API, combined with not enough documentation on performance, and how to get the best out of the CLR. Luckily, this is changing, for example, see Rico Mariani’s blog (or his old blog). For the most part, we are all newbies in this area, but things will only get better.

It’s not at all dissimilar to the change from assembler to C++ code for games. It all comes down to a simple question. Do the benefits outweigh the negatives? Are you willing to sacrifice a small bit of performance for the easier development of managed code? The quicker time to market? The greater security? The easier debugging?

Like i said earlier, there are certain games today that aren’t good fits for having the main engine written in managed code, but there are plenty of titles that are. The top 10 selling PC games a few weeks ago included two versions of the Sims, Zoo Tycoon (+ expansion), Age of Mythology, Backyard Basketball 2004, Uru: Ages beyond myst, any of which could have been written in managed code.

Anyone who’s take the time to write some code in one of the managed languages normally realizes the benefits pretty quickly.

The Render Loop Revisited

Wow. I wouldn’t have thought that my blog on the render loop and doevents would spark as much discussion as it did. Invariably everyone wanted to know what i thought the ‘best’ way to do this was.

Actually, the answer is (naturally) ‘It Depends’. It wasn’t actually an oversight on my part to leave out a recommendation at the end of the post, it was done intentionally. I had hoped to spark peoples interest in learning the cost of the methods they were calling, and pointing out a common scenario where the method had side effects that many people weren’t aware of.

However, since I’ve been asked quite a few times on alternatives, I feel obligated to provide some. =)

Here are some alternatives, in no particular order.

Set your form to have all drawing occur in WmPaint, and do your rendering there. Before the end of the OnPaint method, make sure you do a this.Invalidate(); This will cause the OnPaint method to be fired again immediately.
P/Invoke into the Win32 API and call PeekMessage/TranslateMessage/DispatchMessage. (Doevents actually does something similar, but you can do this without the extra allocations).
Write your own forms class that is a small wrapper around CreateWindowEx, and give yourself complete control over the message loop.
Decide that the DoEvents method works fine for you and stick with it.

Each of these obviously have benefits and disadvantages over the others. Pick the one that best suits your needs.