Sunday, July 17, 2011

Putting the "science" back in computer science

Having transitioned my career into academia and game programming, I'm starting to find that my original intents with blogging really no longer fit my interests much anymore. I still love Agile and also finding better ways to solve problems and write code, but I'm finding that I'm far more interested in algorithms and data structures in general, as well as problems dealing with issues in game programming. I'm taking on topics in networking, graphics, AI, and even *gasp* math.

As a result, I'm going to start blogging more about these things starting now (and also make an effort to blog more regularly). As always, I hate blogging about specific technologies and how to use them, instead preferring a more generalist approach so that my posts are useful to anyone writing code to solve problems and not just how to do things with particular technologies or platforms. In fact, I honestly don't care anymore at all about things like platforms, libraries, and frameworks, other than looking at better ways to write such things.

We've forgotten the "science" part. . .

I feel a lot like developers these days have strayed from fundamentals of things like algorithms and data structures. When was the last time you implemented a hash table? A linked list? Quicksort? Never, because we have all kinds of nifty stuff that handles this for us. That said, it's still important to really know how these things work and I think it's good practice to implement these things as well, if for no other reason than self-improvement.

Justin Bozonier had an interesting blog post recently talking about code katas and that a kata is supposed to be narrow and focused, and that some of the TDD katas are overly complex, but also involve problem solving, which makes it more difficult to tell if you've done things "right." I feel like algorithms at their core are kata; there is one algorithm that is quicksort, but it has many applications and many flavors. The core of all flavors of quicksort is still quicksort, so it's good practice to be able to code quicksort, as it is easy to tell if you got it right and a deeper understanding of it will help you to understand different ways to apply it to your particular problem. This isn't to say that you should always be rigid and say "we're using quicksort to solve this problem so you implement it this way because that's what you do with quicksort" as this is the same problem that I have with people "doing" Agile or "implementing" design patterns. Design patterns, Agile, and algorithms are just guidelines for ways to solve particular problems, not rules, and as such it's important to know these things. They are the fundamentals of what we do as software engineers and computer scientists.

Stand back, I'm going to try science!

For the future, I'm going to focus each of my blog posts on a particular algorithm, data structure, or even mathematical concept. I'll explain the problem that it solves, how the algorithm works, and attempt to show an implementation of it in some programming language (I'll try to vary this but expect a lot of C/C++/C# for a while, with maybe some python thrown in here and there).

My hope is that by reading these, you'll remember your computer science classes and be able to arm yourself with all sorts of nifty ways to solve problems that you can stash in your toolbox just in case they ever come up. After all, knowledge is power.

Monday, September 27, 2010

Enabling SQL Sever on 800 computers at once

Now that I'm teaching a class in databases, it's useful to actually have a database that my students can use when they need to do their projects. Unfortunately, Sql Express (and other versions of SQL server) do not have TCP/IP turned on by default and even if they did, the default is to use dynamic ports, which of course is no good. The reasons for this security through obscurity was the sql slammer worm that hit several years ago, so for "security" (read: so Mort can be stupid without dire consquences) it disables all connectivity with the outside world. It's fairly simple to change the settings, however I can't do it manually on 800 computers. In addition to this, students (no admin access) need to be able to access the instance of Sql Server to create new databases, so that has to be done as well. In this post, I'll show exactly how I did this, just in case anyone else ever needs to do something similar on their own network.

The humble .bat file

For this exercise, I chose a .bat file as my delivery vehicle. They're pretty easy to write and they work on every computer regardless of what you have installed, which is convenient when other more powerful tools are available. Also, I don't need anything more powerful to run a few commands.

The requirements are pretty simple: IT is pretty slammed right now so it needs to be a script that can be run completely unattended, it can't affect anyone working on the remote computer, it shouldn't break anything if it's run multiple times, and if it fails, the fix should be to run it again.

First, enable Sql server


First step is to enable Sql Server to listen on a specific port. There's no command line option to just enable TCP/IP, and obviously using the Sql Server Configuration Tool is out of the question. However, the Surface Area Configuration Tool is an option, and it turns out that tool can clone settings from another instance of Sql Server. All I have to do is enable TCP/IP on my instance, and then run SAC.exe to create a file that has my settings in it. The SAC.exe tool is in the c:\Program Files (x86)\Microsoft Sql Server\90\Shared directory. The command to generate the settings file is "sac.exe out " and there are other options, but the default (export everything) is fine in this case.

To run this on any arbitrary computer, however, the "C:\Program Files (x86)" is no good, since that directory could be anywhere, however Windows conveniently gives us an environment variable called %programfiles% which will be set to whatever directory programs are installed in by default. Finally, the config file could be anywhere (since IT is running this) so let's ensure that we pass that in as a parameter to the .bat file. Therefore, our first line in the file is this:

%programfiles%\Microsoft Sql Server\90\Shared


Setting the default port

The default port Sql is supposed to listen on is 1433, and if it's not on this port then you'll have to pass the port as a param to sqlcmd (or whatever) and I don't want to do that. Also, by default, the port is dynamic (set to 0) so it could change every time the service is restarted, which is no good. It turns out that the only way to change the port outside of the configuration tool is by editing the registry. It turns out that in a .bat file, this is easy. The command is "REG ADD" and then a bunch of parameters. The registry key for where the port lives for Sql Server is here:
HKLM\Software\Microsoft\Microsoft SQL Server\MSSQL.1\MSSQLServer\SuperSocketNetLib\Tcp\IPAll


We need to create a value here called "TcpPort" and set it to 1433. The command to do this is:

REG ADD "HKLM\Software\Microsoft\Microsoft SQL Server\MSSQL.1\MSSQLServer\SuperSocketNetLib\Tcp\IPAll" /v TcpPort /t REG_SZ /d 1433 /f


The /f is necessary to ensure that it doesn't prompt if the key already exists (it might; remember, this has to be able to run multiple times without error). However, the Sql service needs to be stopped and restarted in order for this to take effect. I can use "net start" and "net stop" to accomplish this. Therefore, the next three lines in our .bat file are:

net stop "SQL Server (SQLEXPRESS)"
REG ADD "HKLM\Software\Microsoft\Microsoft SQL Server\MSSQL.1\MSSQLServer\SuperSocketNetLib\Tcp\IPAll" /v TcpPort /t REG_SZ /d 1433 /f
net start "SQL Server (SQLEXPRESS)"



Giving students permissions

Now that SQL is configured in a useful way, I need to grant some roles to the students. These will be in a .sql script, so all I have to do is run it (we'll examine the script in a minute). The way to run scripts is with sqlcmd, which should be part of the system %PATH% so I don't need to type the full path name, I can just use "sqlcmd" and now that the server is on the default port, it should work. I'll be passing in the location of a .sql file anyway, so let's make that a command line param as well. The final line of our .bat file is therefore this:

sqlcmd -E -S localhost -i %2


This tells sqlcmd to use a trusted connection and connect to the instance on localhost and read and execute the contents of the supplied file. Therefore, the full .bat file looks like this:

%programfiles%\Microsoft Sql Server\90\Shared

net stop "SQL Server (SQLEXPRESS)"
REG ADD "HKLM\Software\Microsoft\Microsoft SQL Server\MSSQL.1\MSSQLServer\SuperSocketNetLib\Tcp\IPAll" /v TcpPort /t REG_SZ /d 1433 /f
net start "SQL Server (SQLEXPRESS)"

sqlcmd -E -S localhost -i %2


So what's in the sql script?

In the .sql file, I need to enable some roles. Since all students are domain users, I felt it would be simplest to just add that group to the appropriate roles. You probably don't want to do this on your network; would be best to create a "DBUSERS" group or maybe a "DEVELOPERS" group. Not horribly important here though. Let's look at the script.

First, we need to grant DB access to a group. That's pretty easy, just use sp_grantdbaccess (that's a stored procedure) that takes two arguements, the group and then a sql group to create. We'll need that group to assign to the correct role. The command therefore is this:

sp_grantdbaccess "digipen.edu\domain users", "domainusers"


Next, we want the students to have db_owner permissions so that they can do things without being an admin, although I honestly don't care what level of access the students have since it's an academic setting. The way I grant db_owner is to use the group I just created and sp_addrolemember, so the next line is this:

sp_addrolemember "db_owner", "domainusers"


Finally, they need to be able to create databases. The role for this is "dbcreator" but it's a different kind of role (I don't totally understand this) and therefore I have to add it via sp_addsrvrolemember and the command is this:

sp_addsrvrolemember "digipen.edu\domain users", "dbcreator"


Don't forget to add the "go" so that it executes. The full .sql file is this:

sp_grantdbaccess "digipen.edu\domain users", "domainusers"
go
sp_addrolemember "db_owner", "domainusers"
go
sp_addsrvrolemember "digipen.edu\domain users", "dbcreator"
go

Now, everything is good and I can run this automatically on every computer with this command:

dbsetup.bat settings.txt addroles.sql


I've run this on hundreds of computers and it works great. Obviously you can put whatever settings you want in this thing and grant whatever roles you need for your environment. Hopefully this will help someone with automating tasks; I sure could have used a post like this a week ago.

Tuesday, June 8, 2010

How to fix Office plugin dependencies

I've been doing some development lately on an Outlook plugin and I've run in to a ridiculous number of issues messing with references, particularly when I have been using 3 different versions of Outlook and uninstalling/reinstalling them continuously. As a result, I've had to figure out a lot about how these interop assemblies and COM objects work. In this post, I will share what I've learned over the last week or so in hopes that it will help others who are using popular search engines to try to figure this out.

How to make a plugin

To develop a plugin isn't horribly difficult. The dependencies you need are as follows:

Office Interop- you need the interop assemblies for the plugin to function. These are also referred to as the Primary Interop Assemblies (PIA's) Often these are installed with Office but you can download them manually if you need to. These are referenced as a COM object. More on that in a minute.

Visual Studio tools for Office- you need these also. You can get one version of them here: http://www.microsoft.com/downloads/details.aspx?familyid=5e86cab3-6fd6-4955-b979-e1676db6b3cb&displaylang=en The dll's here will live in the GAC.

For both of these, you'll need to reference the appropriate component for the Office program that you're targeting. For example, I am working with Outlook, so I need the Outlook PIA's to be referenced. The installer you download should have PIA's for all the office products so you shouldn't have to worry so much about that.

Office.Tools.Common- these are the common tools for office that you need for any addin you build (at least as far as I can tell). Also installed with VSTO so you should have them.

The version you need of these depends on what version of Office you're building for, although in theory these are backwards compatible, so if you build a plugin for Office 2003, then it should (in theory) be able to install and run on later versions of Office. I've had some success with this so far; the only issues I've had is with building.


I can't find the reference- the dreaded yellow exclamation point!

I've noticed that I tend to get a lot of build errors relating to these dependencies and not being able to find them. In 90% of the cases, this involves having installed multiple versions of Office or the PIA's, so in general all you have to do is fix the references.

Another error you may see is something like "HRESULT: 0x80029C4A (TYPE_E_CANTLOADLIBRARY)" which means that Visual Studio can't figure out where the .dll is that you're trying to reference, most likely because that dll doesn't exist.

Unfortunately, fixing this involves messing with registries, which is never fun. I'm going to walk you through it as it's not that hard if you know what you're looking for.

FIRST: BACK UP YOUR REGISTRIES!!!! Don't "pull an Atwood," be sure you have a good backup.

Fixing the COM stuff:

If you bring up the properties for the "Outlook" (or other product) interop assemblies, you'll see that the identity is set to some sort of GUID that looks a lot like this: {00062FFF-0000-0000-C000-000000000046}\9.4\0\primary

The GUID identified the registry key under HKCR (HKEY_CLASSES_ROOT)\typelib. If this is messed up, it's usually because there are multiple versions of this thing installed, so open regedit and navigate to HKCR\TypeLib\{00062FFF-0000-0000-C000-000000000046} and look at the sub keys. You'll see 1.0 and 2.0 (most likely). Expand 2.0 and you should see 0, flags, and helpdir. Expand 0. You may see both win32 and win64. If you see both, delete the one that you don't need (probably win64) because it means that Visual Studio is confused about which tlb to load. Next, the default key under win32 (or win64) will point to a .tlb file somewhere. Make sure that the file it points to exists. If it doesn't, point it to the right file OR nuke the key and re-install the PIA's. You may see that multiple COM objects are being used. Their GUID's will be under the same TypeLib key in the registries so check those for multiple versions and nuke the keys that point to stuff that isn't installed.

Next, check the stdole COM object the same way. Its key is is HKEY_CLASSES_ROOT\TypeLib\{00020430-0000-0000-C000-000000000046} and make sure that the file it points to exists and that there's only one entry.

Fixing the VSTO stuff

Next on the list is the VSTO dll's, which actually live in the GAC. There are a few things that can happen, but first nagivate to HKEY_CLASSES_ROOT\TypeLib\{00062FFF-0000-0000-C000-000000000046} and check for versions. You'll see various versions under there such as 9.4 (Office 2010) or 9.2 (Office 2007) and so on. I've found that previous office versions tend to stick around, and depending on what versions you have listed and what you actually reference and what's actually installed, you may find that there is a key that Visual Studio is using that points to something that doesn't actually exist. To fix this, delete all keys that are associated with versions of Office that you don't have installed. I had both 9.2 and 9.4 installed and when I nuked 9.2 (leaving 9.4, which was actually installed) then it started working again. Also, the same win32 and win64 problems can occur here as well, so be sure to delete what you don't need. Finally, check the default key under win32 and ensure that the file it points to exists and is the correct version for what you're building against.

This last one is a bit more annoying, because even if you delete the reference in Visual Studio and re-add it, it still might point to the wrong location because the registries are pointing to an invalid file.

Once you've gone through all of these, you should be able to build and run your addin and everything will be happy and good.

One more thing of note: if your addin can't be loaded, Office will change another registry key to prevent the addin from being loaded the next time Office starts. That key is located here: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office\\\Addins\ and will contain a DWORD called "LoadBehavior" that controls how Office will load that plugin. If the value is 3, it will automatically load the plugin when that Office product starts. If it's 2, it will not be loaded. In order to force your plugin to be loaded, change this value appropriately. Office will re-set it as soon as it tries and fails to load the plugin so be aware of this and ensure that it's correct, or you'll spend extra time trying to figure out why your plugin isn't working when the cause is just an incorrect setting.


I hope that this has been useful to someone; it took me a few days to figure all of this out.


Friday, May 28, 2010

Why I won't be adopting Visual Studio 2010 just yet

I'm probably the only person in the twitterverse that isn't excited about VS2010. I've given a few reasons, but in general I don't think I can adequately express my concerns in 140 characters, so in this post I will explain the reasons behind my choice, hopefully in a way that makes sense.

.NET 4.0

I want to be perfectly clear; I'm limiting this post to Visual Studio 2010, NOT the .NET 4.0 framework. I actually really like .NET 4.0 and I think that it has some great new stuff and some good fixes for old stuff. I know that WCF has done some cool things also and I'm still excited about where the whole System.Web.MVC thing is going. The problem I have is really limited to the Visual Studio IDE. I'm also not going to mention my raging hatred for TFS here.

Past versions

I've been really impressed with Visual Studio since VS97. In general, it's been fairly stable, it has some good features, and the debugger has saved me many times. I know that intellisense isn't always as intelligent as I would like, but overall it's been pretty great and has gotten better with time. I've always adopted fairly early on, also; I got VS 2008 from some sort of Visual Studio release event at MS (I forget what it was exactly). I have had a few complaints over the years (add reference took FOREVER in previous versions, vastly improved in VS2008, and then don't forget the dozens of unsorted new items you could add, all of which were fancy names for text file), but often these were with regards to how people were using Visual Studio (see my post about Visual Studio is not a build tool) and not really towards the product itself. I could leave it running for days and it would usually remain stable.

VS 2010- my history

So first of all, I'm not judging this just by the quality of the betas and of the RC build. Remember, I worked at Microsoft so I've actually been using VS2010 internally for a long time; I believe I started sometime back towards the end of 2008 actually and used it all through 2009, since I was working on .NET 4.0. My first impressions of NetFx for 4.0 were that the installer was pretty lousy (it didn't fully support win7 yet though) but otherwise it was pretty nice. However, my first impressions of Visual Studio were that it was slow and it crashed alot. No big deal, I thought, it's still very early in the product's lifecycle, I'll struggle with it for a while and see how things go.

So months go by and the product is not any faster. If anything, it crashes more frequently, and this is while I'm trying to do simple things like open a file or type, not anything difficult, although I had problems with the debugger as well. Beta 1 came out, and there was a lot of feedback that it was slow and crashed a lot. That's because it was and it did. That's ok, still early in the product's lifecycle, right? Surely they must be addressing performance issues.

Nope, still adding features. Next beta, product still slow and still crashes a lot. In fact, I think it was actually getting less stable towards the end of the year. In November and December, it would crash constantly when I would do simple, normal things like "find in files" or "right-click" or "type something" and I filed bugs. My personal favorite was when it crashed if a file was called "GlobalSuppressions.cs" (didn't matter if VS created the file or if I did, I think it was just the name). Ended up being something to do with corrupting the line endings. Oops. The beta didn't have that bug in it.

Of course, towards the end, there seemed to be a massive push to make the thing go faster and crash less, which of course resulted in the release date slipping and a public RC. Surely this is a good thing, right?

So why the hate?

Over the course of this product's life, I've seen it get less and less stable, up through the end of last year. Features were added, and some of them were good, but overall the damn thing just didn't run and cost me weeks of productivity fighting with it. This tells me several things:

  1. This is a re-write of Visual Studio's IDE. In fact, it's written in WPF, which is not something that I would have described as "fast" or "performance oriented" in general, so I (and some others that I knew) had some doubts from the beginning. That on its own would be a pretty dumb reason to hate on a product but that brings us to my second point:
  2. This thing has, from the very beginning, been unstable and slow. This does not inspire confidence in the underlying design of the software, particularly since it was a total rewrite and has gotten worse over time. Then, at the end, it magically speeds way up somehow. How did that happen?

Well, if I were working on a product for over a year and it was slow the whole time, and then in the last couple of months I had to make it faster, I would definitely not have time to go back and look at the whole design and refactor or re-work fundamental parts of the system. I would profile it, find a bottleneck, and then hack it into submission, rinse, and repeat. I don't think I would be able to change the design to something fundamentally more stable at that point, and knowing what I know about how software is developed at Microsoft, there's no way I would be allowed to alter parts of the system like that. Just looking back at my own experience, I feel like what best fits the evidence and my observations is that people started hacking away at it trying to fix the bugs, leaving it in a state where it works, but the underlying design and code is unstable and brittle. If I still had more hacking to do, I would try to buy time by pushing the release date back a bit.

I have also heard complaints and speculation that the use of WPF has caused problems. It is definitely true that WPF is somewhat untested technology, particularly within Microsoft, where VS 2010 (I believe) is the first major product to be built using that technology. Think of it as public dogfooding. However, I think trying to dogfood WPF by including it in a major product was a huge mistake, and other people (who will remain nameless but work at MS in divisions that aren't devdiv) have expressed similar concerns, and some of these people were in a better position than I to make those statements. My personal opinion is that WPF is a contributing factor but not the underlying cause of the performance problems.

Now, I have not looked at the code base and I can't say for certain that a bunch of people just started hacking away at something with a fundamentally unstable design, however, in my mind at least (and in the minds of others that I know), this is the scenario that best fits both the evidence available as well as what I know about how Microsoft works internally (at least in some orgs). I know that some of you will probably flame me and say that I'm being completely unreasonable and acting on something for which I have no evidence.

To that I would say: that's my point. I don't know exactly what happened with this product or why it was so slow and then suddenly got so much faster. I can't say for sure that it was a bunch of people just hacking away at the app at the last minute and I can't say with certainty that this made it brittle, unstable, or just poorly designed. What I can say is that, given what I have observed, I just don't trust it yet. I don't trust that it will increase my productivity. I don't trust that it will be stable and not corrupt my data. I don't trust that I will be able to use it without pain. I don't trust that I will be able to cleanly uninstall it if it ends up hurting more than it helps (heard plenty of uninstall complaints on Twitter and I know for a fact that the uninstall scenario is less of a priority at MS or at least it was when I was there).

I haven't lost hope for it; I certainly think there are some great features included in VS2010, but I think I'll wait for the service pack.

Friday, October 9, 2009

Don't comment it out - Delete that code!

You have a big, ugly, huge block of code that you want to improve. The first thing you do? Comment it all out. Then, try to re-implement in a better way. Finally, you just leave the commented out code in the file in case you need to reference it in the future. I've been guilty of this a lot in the past but I've recently come to the conclusion that instead of commenting it out, you should delete it immediately. Here's why:

It's bad code

You've already identified this code as bad. You don't want it to exist. If you keep it "commented out" so that you can reference it, you're referencing bad code. Why would you want to do that? If the code were good, you wouldn't have commented it out, right? Since it's bad, why would you possibly want to use it. If you aren't going to use it, why keep it around?

Copy and pasting of commented out code

I also notice that I sometimes will copy and paste parts of that commented out code in my new implementation. My realization is that if I'm actually using the code that I commented out, I should either:
  1. Stop using it. I commented it out for a reason.
  2. Don't comment it out. If it actually is good code, even if it's a small part, why did I comment it out? I should have left it in.

Leaving it in for posterity

A lot of code that gets left in "so you can see how it used to work." If you want to see how it used to work, you have version control, right? Just look at the history of the file and diff the current file. Problem solved.

Big block comments detailing how something works or what it does

These are also bad. If you find yourself writing one, it means that this part of your code is probably doing too much; split it up into understandable chunks. If you need an explanation of what the code is accomplishing, your unit tests are not descriptive enough. The only exception is if the code implements something intrinsicly complicated such as an an RFC, where even someone with domain knowledge may have difficulty understanding it (example: in RFC 2822, WTF is an addr-spec and what characters are allowed in a domain literal? I don't remember and I implemented it). In this case, two things are useful:

  1. Reference the RFC (or other standards documentation) by section and subsection. If I'm implementing an addr-spec reader, I can write "This implements the addr-spec in RFC 2822, section 3.4.1 (page 16)" and now someone reading the code can go read the RFC if they need to know more about what the code is doing.
  2. Write a functional spec. I know, in Agile we favor working software over comprehensive documentation. That's fine. Just because we favor working software doesn't mean we never document. A functional spec of complex logic is hardly "comprehensive documentation;" it's a useful vehicle for clarifying how something works or how different components interact. If that the most effective solution, there's nothing non-Agile about doing it.

In conclusion

If you find yourself commenting out code, ask yourself if you really need to keep any of it and if so, then don't comment that part out. Delete everything else immediately. I promise you that you won't miss them.

Tuesday, May 5, 2009

I know it's been a while

It's been a long time since my last post but I have a few things that I'm working on so I will have something again soon. I just finished my first semester of teaching networking at Digipen and it was a lot of fun! I will be returning there in the fall to teach the same class again (CS260) as well as an advanced networking class (CS 261). I'll have more about this experience in another post.

I've also got a few open source projects that I'm working on. I haven't had any spare time to work on any of them until class ended, but now I can devote at least one of them a bit of time so hopefully I'll have something to post about here within the next month or so.

Finally, for those of you who have been asking, I am indeed still at the 'soft and did not get hit by the most recent round of layoffs, although unfortunately the recruiter that I worked with to get the job did get let go and I wish him the best of luck as he was awesome. Lots of stuff going on at the 'soft now with Win7 at RC (I'm using the RC build, it's great and I predict that Win7 is going to rule) as well as the .Net 4.0 Beta coming up in the not too distant future (don't know the exact date yet) as well as some Silverlight stuff that I'm only minimally involved with as of this point. I've been away from the Alt.net lists lately also so I hope to be a bit more active in the community in the not too distant future as well.

Stay tuned!

Monday, March 23, 2009

Don't speed up your code, make it do less instead

I just read an extremely insightful blog post here. It's about speeding up your code, and the author makes an interesting point that he learned long ago from a mentor of his:

So there is no way, really, to make code go faster, because there is no way to
make instructions execute faster. There is only such a thing as making the
machine do less.

This is fscking brilliant advice. I never thought about code this way, but upon reflection I think he's absolutely correct. I can think of 1000 ways to accomplish this in both managed and unmanaged code, but ultimately it comes from knowing how your code is being built and what instructions are being created from it. As I think about some of the general principles of OO design (i.e. SOLID and such) I really am starting to see how efficient, extensible code improves this. Maybe the right thing to do in managed or unmanaged code when you start analyzing your code for performance is to actually look at the assembly or MSIL that is being generated instead of staring at it and trying to think of ways to optimize. Read the article and learn.