Agilology: 2009

Friday, October 9, 2009

Don't comment it out - Delete that code!

You have a big, ugly, huge block of code that you want to improve. The first thing you do? Comment it all out. Then, try to re-implement in a better way. Finally, you just leave the commented out code in the file in case you need to reference it in the future. I've been guilty of this a lot in the past but I've recently come to the conclusion that instead of commenting it out, you should delete it immediately. Here's why:

It's bad code

You've already identified this code as bad. You don't want it to exist. If you keep it "commented out" so that you can reference it, you're referencing bad code. Why would you want to do that? If the code were good, you wouldn't have commented it out, right? Since it's bad, why would you possibly want to use it. If you aren't going to use it, why keep it around?

Copy and pasting of commented out code

I also notice that I sometimes will copy and paste parts of that commented out code in my new implementation. My realization is that if I'm actually using the code that I commented out, I should either:

Stop using it. I commented it out for a reason.
Don't comment it out. If it actually is good code, even if it's a small part, why did I comment it out? I should have left it in.

Leaving it in for posterity

A lot of code that gets left in "so you can see how it used to work." If you want to see how it used to work, you have version control, right? Just look at the history of the file and diff the current file. Problem solved.

Big block comments detailing how something works or what it does

These are also bad. If you find yourself writing one, it means that this part of your code is probably doing too much; split it up into understandable chunks. If you need an explanation of what the code is accomplishing, your unit tests are not descriptive enough. The only exception is if the code implements something intrinsicly complicated such as an an RFC, where even someone with domain knowledge may have difficulty understanding it (example: in RFC 2822, WTF is an addr-spec and what characters are allowed in a domain literal? I don't remember and I implemented it). In this case, two things are useful:

Reference the RFC (or other standards documentation) by section and subsection. If I'm implementing an addr-spec reader, I can write "This implements the addr-spec in RFC 2822, section 3.4.1 (page 16)" and now someone reading the code can go read the RFC if they need to know more about what the code is doing.
Write a functional spec. I know, in Agile we favor working software over comprehensive documentation. That's fine. Just because we favor working software doesn't mean we never document. A functional spec of complex logic is hardly "comprehensive documentation;" it's a useful vehicle for clarifying how something works or how different components interact. If that the most effective solution, there's nothing non-Agile about doing it.

In conclusion

If you find yourself commenting out code, ask yourself if you really need to keep any of it and if so, then don't comment that part out. Delete everything else immediately. I promise you that you won't miss them.

Tuesday, May 5, 2009

I know it's been a while

It's been a long time since my last post but I have a few things that I'm working on so I will have something again soon. I just finished my first semester of teaching networking at Digipen and it was a lot of fun! I will be returning there in the fall to teach the same class again (CS260) as well as an advanced networking class (CS 261). I'll have more about this experience in another post.

I've also got a few open source projects that I'm working on. I haven't had any spare time to work on any of them until class ended, but now I can devote at least one of them a bit of time so hopefully I'll have something to post about here within the next month or so.

Finally, for those of you who have been asking, I am indeed still at the 'soft and did not get hit by the most recent round of layoffs, although unfortunately the recruiter that I worked with to get the job did get let go and I wish him the best of luck as he was awesome. Lots of stuff going on at the 'soft now with Win7 at RC (I'm using the RC build, it's great and I predict that Win7 is going to rule) as well as the .Net 4.0 Beta coming up in the not too distant future (don't know the exact date yet) as well as some Silverlight stuff that I'm only minimally involved with as of this point. I've been away from the Alt.net lists lately also so I hope to be a bit more active in the community in the not too distant future as well.

Stay tuned!

Monday, March 23, 2009

Don't speed up your code, make it do less instead

I just read an extremely insightful blog post here. It's about speeding up your code, and the author makes an interesting point that he learned long ago from a mentor of his:

So there is no way, really, to make code go faster, because there is no way to
make instructions execute faster. There is only such a thing as making the
machine do less.

This is fscking brilliant advice. I never thought about code this way, but upon reflection I think he's absolutely correct. I can think of 1000 ways to accomplish this in both managed and unmanaged code, but ultimately it comes from knowing how your code is being built and what instructions are being created from it. As I think about some of the general principles of OO design (i.e. SOLID and such) I really am starting to see how efficient, extensible code improves this. Maybe the right thing to do in managed or unmanaged code when you start analyzing your code for performance is to actually look at the assembly or MSIL that is being generated instead of staring at it and trying to think of ways to optimize. Read the article and learn.

Monday, March 2, 2009

Alt.Net Seattle reflections

I always have a hard time with my alt.net reflection blog post. Many great things happened, many great conversations were had, and a lot of great ideas were shared.

Organization

First off, this was my first time as an organizer for anything, so that was a completely new experience for me. One of my first thoughts when we started planning this thing was that for a conference that is self organizing, we sure were doing a lot of work! Ultimately, it really is quite a challenge to pulling this thing off but I think that everyone did a great job. I would especially like to recognize Glenn Block for his outstanding leadership in getting this whole thing going. Finally, I would like to thank Digipen for graciously hosting the event again and for giving us access to their wireless network, as well as Joel and Wesley who were outstandingly helpful as well in making sure that rooms were set up and that everything with the facilities was running smoothly (sorry about the fire alarm).

Everyone still has something to contribute

I said this last year and it definitely proved to be true this year again. There was a lot of diversity in the background, experience, and knowledge of the attendees and when all of them get together to discuss something, I always find that everyone has something valuable to contribute. It is important to remember that we are all leaders

Lots of n00bs!

It was also great to see a lot of new people to Alt .Net this year. There were lots of new faces and I think it's important that we continually encourage people to join the community. If we don't, I fear that we would just become an echo chamber which would not be good. I am glad that we were able to have so many new people attend this year and that we opened registration up a bit more to give as many people as possible a chance to attend. We also had a great conversation at the Alt .Net pedagogy discussion and I hope that we're able to get going on some of the ideas that we talked about, although I think that's going to have to wait for a future blog post for me to really do it justice.

And good to see you all again

It was great to meet all of you whom I met for the first time and great to see everyone that I met last year again. We had a lot of good hallway conversations everywhere and it was always hard for me to go back home for the night as I felt like there was always something else to be said, someone else to talk to, or something new to learn (I had a great conversation at 13 coins at 3:00am on Sunday). I'm looking forward to the next conference or to wherever I see you. Feel free to tweet me (http://twitter.com/jeff_tucker), email me, or otherwise contact me any time about anything interesting (most of you have my phone number now and don't realize it as it was posted on the door of Digipen for two days).

Videos and stuff

There's a lot of video out there from the conference. Several people were recording and streaming. I think Scott Hanselman has the most complete list of the videos that I've yet seen here. You can also check out the wiki for the event here.

Finally, I'd like to join the staff of the Redmond Town Center Red Robin in wishing Justin Angel a very happy birthday!

Monday, January 19, 2009

Why dispose is necessary and other insights on managed code

Jeff Atwood has once again jumped the shark, this time about managed code. Then there's this little gem from twitter:

I believe .Dispose() is a form of optimization, which is necessary *sometimes*
but not *always*. Anyone got links with evidence otherwise?

So for my next trick, I am going to show Jeff (and everyone else) how managed code works and when it fails.

First, I always have a saying when introducing managed code:

Managed code isn't.

The reason that I say this is that most people think that because your code is managed, you don't have to think about memory management at all; you can just allocate whatever you feel like and the garbage collector will just take care of it for you. According to Jeff, this is a good thing. It is indeed a good thing, although unfortunately it doesn't work that way, and if you don't know precisely what the CLR is doing to your code and how garbage collection works, you're going to have problems and when they occur you will have no idea where to look for solutions. The goal of this post is to try to educate you as to what is going on. I will not be able to go into a lot of depth in one post, however this should get you started and give you some resources to do further exploration when you run into these problems in your code. Let's start with looking at dispose().

For those of you who are unsure, dispose() is an implementation of the disposal pattern. The idea behind this is that you have a way to mark a class as having an explicit cleanup that needs to occur prior to that object being deallocated. .Net accomplishes this by an object implementing the IDisposable interface with the Dispose() method, which you should call when you are finished using an object so that it can free its resources. This method should invoke the same code as when Finalize() is called (more on finalization below), so if someone forgets to call Dispose(), it is still guaranteed to be cleaned up prior to garbage collection occurring. Therefore, it is a good place for cleaning up unmanaged resources, like database connections or open streams (e.g. files and network connections). That way you are 100% guaranteed that this cleanup will occur, even if the programmer simply stops using a particular object and lets it fall out of scope.

Using(disposal pattern)

This brings us to our second useful piece of code, which is the using() {} block. This block allows you to put something that implements IDisposable (henceforth "disposable") inside the using() keyword, and then upon leaving the basic block defined by the using statement (i.e. going outside of the {} by any means), the dispose() method WILL be run prior to the jump that occurs. This means that inside my using block I can return, break, exit, throw, or otherwise jump to another block of code and before that code exists, dispose is called on the object in the using() statement. These can be nested, in which case objects are disposed in FILO order (first in last out). These also work with finally blocks so that if I have a try..finally with a nested using block, the dispose is called prior to the finally, even in cases where code inside the using block throws an exception dispose is still called first. You can test this quite easily by writing a simple app with nested using blocks and then throwing from inside one of them and catching outside of it.

So why does using() exist?

When I interviewed for Microsoft, one of the questions I got asked was "Is the using() statement syntactical sugar?" My answer was that yes, it is, but it is important. Here's why: I define syntactical sugar as any language keywords that perform a function that you could still perform without using that specific keyword(s) but the implementation would be uglier, more difficult, or less likely for a programmer to use properly. For example, I could still close database connections with a try . . .finally block and explicitly call dispose(). In fact, if you look at the MSIL code generated for a using() block you'll see that it generates a try . . .finally block. I could even not have IDisposable and just have a cleanup method somewhere else or even explicitly define it. The nifty thing about using() is that I can just wrap an object around it and be guaranteed that the object will be disposed of properly in a deterministic fashion (with finally, it's impossible to know when dispose will be called because you don't know if an exception will be thrown until runtime and exception handling runs before finally does).

In addition, what if I forget to call dispose in the finally block? What if it is far away from where I use the disposable object so I don't see it right away? What if I need to ensure the order that dispose is called in for multiple objects and I mess that up? What if exception handling is significant to how I handle disposal? There are a lot of what-if's here and the using() statement basically give me a convenient shortcut while making my code a bit more clean.

So how does this tie into garbage collection now and why call dispose() at all?
The garbage collector mostly runs whenever it feels like it and what it does is entirely up to the garbage collector. Let's look at Jeff's SqlConnection cleanup example and see what's going on here:

sqlConnection.Close();
sqlConnection.Dispose();
sqlConnection = null;

So first, we call Close(). This has nothing to do with garbage collection, presumably it just closes the connection itself. Then we call Dispose(). Presumably, Dispose() would clean up an open connection, probably by calling Close(), but it's entirely possible that it does other things as well. According to MSDN, Close() and Dispose() are redundant, so it looks like Jeff is violating DRY here, although I think that having Close() and Dispose() do the same thing is ugly and misleading code. Finally, he's setting the object = null. Remember kids, this isn't actually setting the value of the object, he's just setting his current reference to it to null, which tells the garbage collector that he's not using it here anymore. If there are no other references still in scope in the rest of the application, it means that the garbage collector is free to clean it up (in unmanaged code, it would mean you have a memory leak).

Let's look at a non-real, contrived example; if I grab a connection to a db and I'm using winsock to connect, I'll get a handle to the socket that I'm using from winsock, which will be wrapped in some sort of SafeHandle object to ensure that it gets cleaned up. If Close() only closes the socket, it may not clean up the handles. If Dispose() calls Close() and then cleans up the handles, then it's entirely possible that the Close() and Dispose() calls are redundant, but that they do different things. Maybe if you call Close() you can re-use the connection object. That might be useful, particularly if that object takes a long time to create or is resource-heavy even when not connected (like if it allocates buffers for example). Ultimately, what you do depends on how you implement your object, so be sure that you have a good design and don't encourage bad code with redundant calls.

One other problem can exist: when an object is Disposing, what if somewhere you have another reference to that object and that reference is in use that you don't know about? If you let the garbage collector just run Dispose for you then obviously this won't be a problem, but if an eager developer calls it, you definitely have a problem. I think that if you have this problem, probably your code is not well designed, but there's a cheap way to manage it if you think it's necessary: create a private "bool isdisposed" property and when you call dispose, set this property immediately at the start of the Dispose() method (and make this thread-safe through correct use of some sort of locking strategy). Then, on every other method call, you can check the value of this property and then react accordingly, although again I would label this as code smell.

And then there's the object = null
Jeff mentions the confusing choice of calling dispose and setting null but in reality, if you know what these actually do, there's no confusion at all. If you call Dispose(), as I have mentioned at least six times now, it will run the cleanup of the object when you call it. If you set the object to null, then all you're doing is removing a reference to it and the garbage collector will then see that there are no more references to this object so it's ok to clean it up. If it wants to. When it does do cleanup, it will call Dispose for you, however it is completely non-deterministic as to when this occurs, so leaving it to the garbage collector for things like database connections or files or network connections is a bad idea since it would be easy for you to have a bunch of open connections this way, which can cause all kinds of bad problems.

So now, how is disposal "more of an optimization than anything else?"

It's not. Jeff is wrong because Jeff is right (different Jeff though. . .) and Jeff (me) says that calling Dispose is absolutely necessary so that you can have deterministic behavior in your application in terms of resource deallocation. There are lots of resources that you're using in the .Net framework, even if you are not aware of the fact (things like threads, files, even graphics tend to open handles into unmanaged code). If you are not careful about when you dispose things, it's easy to start leaking these, particularly in high-performance applications. It really isn't an "optimization" to call Dispose when you want to explicitly free up resources, particularly unmanaged ones.

Also, if you have some significant amount of work that has to be done to free up those resources, then you really should do that work at a time of your choosing. If you don't, then potentially any time you create a new reference type, the garbage collector may run and may collect something that you aren't using anymore, at which point all of that work that needs to be done to free your resources will just happen, which may affect the performance of your app in various ways, none of which will make it go faster. This isn't optimization though, it's simply good coding style.

So what should I do in my code?

First, if you allocate any resources in an object, particularly unmanaged ones, you should clean them up in your Dispose() method (while ensuring that your object implements IDisposable). I find it unusual that your objects will need to allocate external resources, but it's likely that you will consume some without realizing it, which means that you should either dispose of them explicitly inside the object when you're through with them, or that you should call their Dispose() methods from inside your object's own Dispose() method.

Next, if you have another method like "Close()" or "Shutdown()" or "DieYouGravySuckingPigDog()" then you should ensure that these methods don't do any disposal for you. These types of methods aren't meant to clean up an object so that it can't be used anymore (that's what Dispose() is for). If you have an object, then at any point in the object's lifecycle it should be usable in some way prior to calling Dispose(), and not-usable after calling Dispose(). Make sure that a disposed object won't try to do anything bad if you accidently call it after you've disposed it also, generally by writing good code where you don't attempt do this, but some checking in your object won't hurt. If you want to reuse objects a lot, consider using an object pool (a buffer pool for handling Socket reads would be a good idea as an example).

Finally, remember that just because you set an object to null doesn't mean that it's going to be disposed at any point in the near future, it is just being marked as no longer in use (unless you have another reference to it somewhere else). This is not a strategy for garbage collection

So what about Finalize()?
It turns out there is another method called Finalize() that is used by the garbage collector for cleaning up unmanaged resources. It's purpose in life, according to MSDN, is to clean up unmanaged resources in the event that Dispose was never called, but if you implement Dispose you should NOT have a Finalize and you should use GC.SurpressFinalize(this) to stop the finalize method from being called. This is to ensure that you do not do duplicate work that is not needed (generally the code called in Dispose and Finalize should be the same). There is also an object destructor, but for managed objects you should not use it. Again, this is all according to MSDN, and I would trust MSDN on garbage collection more than a person who used to code and stopped doing that to write about it.

Wait, wait, I didn't get that. What should I do? Can you sum this all up?

If you are using something that is IDisposable, you should call Dispose() when you are finished with it and when it is a good time in your app to do so performance-wise. Just letting the garbage collector do it for you is not a strategy for anything except crappy coding (if crappy code is your goal, then never call dispose and you'll be about 80% there)
If you are implementing an object that uses unmanaged resources, you should make that object IDisposable and free those unmanaged resources and then call GC.SurpressFinalize(this) in your Dispose() method. Ensure that calling Dispose() multiple times does not throw an exception or duplicate work (setting a private property is a useful idea here)
Setting an object to null is not a strategy for garbage collection. It is a strategy for crap.
If you have an object that is re-usable after calling some sort of closing method (like a connection, file, etc.) then consider pooling that object, particularly if that object is expensive to create or destroy. Remember, every time you allocate a reference type you may trigger garbage collection and you'll never know it.
"Managed code" isn't managed all that well; you need to be absolutely aware of what the CLR is doing to your code and how garbage collection works. It's like walking on a tightrope- managed code gives you a safety net but that doesn't mean you should just jump into it whenever and then climb back up and keep going. That is not a strategy.
Every time codinghorror posts something that jumps the shark, check out http://agilology.blogspot.com/ for clarification and amusement.

Agilology