Thursday, September 11, 2008

Let's just blame Microsoft!

This is a good one. Some guy named Steven J. Vaughn-Nichols is blaming the Sept. 10th London Stock Exchange crash on .Net. Wow, informative! It crashed. it runs on .Net, so that must be the reason. .Net isn't suited for real time systems! Right? Not so fast, dude.

Full disclosure

Before I start, let me just say that I do work for Microsoft and I work on the .Net framework. Does this make me biased? Probably, but I'm going to attempt to focus on other things besides "Microsoft good, .Net good" here and draw a logical conclusion.

What happens

So, what's the scenario? Well, apparently (according to Steven) the LSE runs some software called TradElec, which is a c# application. It also runs on Windows 2003 with Sql Server 2000. Clearly, the weak point is .Net here, nothing else it could possible be. Right?

You are full of fail


So Steven probably wrote all those "conclusions" down on a mat, which he then placed on the floor, so that he can "jump" to them. He clearly has. Something broke, so it's Microsoft's fault, because .Net just sucks for real-time applications. So does Sql Server 2000 and Windows Server 2003. There's nothing else that could have gone wrong, right?


There's no way it could be human error. No way at all

What he doesn't say is that this could possibly be programmer error. There are thousands of ways that a programmer could mess this up and just write crappy code. For network connections, the Asynchronous programming model is not trivial and requires some reasonably deep understanding before you can really make it work well for you. I see a lot of people mess this up, and unfortunately it's their fault and their problem most of the time because the performance you get through asynchronous programming comes at the price of being complex and involving multiple threads, which is something that a lot of people just don't understand.

Additionally, we don't know how they're doing their DB access here. Maybe they have some sort of transaction hell that's locking the shit out of their DB. Maybe they don't use stored procs (BIG performance issue in Sql2k, fixed in Sql2k5 so not a big deal there). Maybe they don't know how to create an index. My point is that we don't know, so we can't say for sure. Probably, however, this is an issue.

Finally, the .Net framework itself has some interesting quirks if you don't really understand the CLR well. I don't usually recommend books on specific software technologies, but go out and get a copy of CLR via C# by Jeffery Richter; I learned more about the CLR in that book in a month than I did in two years of using .Net every day. Granted, garbage collection takes away a lot of the complexities of memory management, which can be a big performance issue, however as a developer you STILL need to understand what the CLR is doing. Things like boxing and unboxing can take time, mis-using value types and reference types eat performance, even how you allocate objects can affect performance. For example, if you're using buffers for network traffic, if you allocate a new buffer each time, you may trigger garbage collection which will randomly hurt performance and be difficult to track down. if instead you allocate a massive pool of buffers and then just use those, they will live on the large object heap and they will NEVER trigger garbage collection so your app will be more consistent.

Blame Canada . . .um. . . er. . . .Net?

So do we blame .Net? With this much information, we really can't. It's far more likely that Sql 2000 is to blame (if anything), although I've seen shit databases created in open source just as often as MS Sql so it's entirely possible that it was just designed stupidly. It's also equally likely that the people who wrote this just screwed up, either in writing the code or improperly testing it. Again, these things would happen if the same programmers used open source software.

Wow, what a useful solution!!!

What does Steven suggest? Use linux. Wow, that will fix everything! I'll just go install it right now, with KDE and everything!!! Wait, no.

Next, he suggests Oracle. I've used Oracle and in some ways I love it way more than MS Sql Server but in other ways I hate it a lot. Oracle is better than Sql2k but I have yet to see proof that it's better than Sql2k5, however I won't pass judgement on that yet. Maybe Oracle would be a better db choice. Not that Oracle's open source or anything. It also works with .Net. I've used it.

Next, he recommends Java. Java, with the worst threading model in the history of the world (more on that later), is his recommendation for a fix! I have yet to see a case where a Java application works significantly better than a .Net application doing the same thing. A lot of the tools are similar. The languages are similar.

In conclusion, Steven is jumping to conclusions that Open Source software (+Oracle) is better for performance. He has no evidence other than "it was running .net and it crashed" to base this on. He is therefore wrong. I have an idea. So you take this mat, and you write various "conclusions" on it, and put it on the floor, so you can "jump" to them. I'll send him one!

And I KNOW it wasn't a .Net networking issue because

I am on the NCL team at Microsoft. We own the System.Net namespace, which is what handles networking in the .Net framework. It was my turn to handle issues that came that week. If it had been a .Net issue with networking, I would have heard about it. I heard nothing.

No comments: