Tuesday, July 29, 2008

Teaching Java in school is just as controversial as an interview with Justice Gray

I read a great article today here that was a follow up to an interview here with a Computer Science professor Robert Dewar of New York University. I'll stop for a moment while you go read the articles. Both of them. Like I did. All done? Good, now we can have an intelligent conversation about them, and I have to say that I absolutely agree with Prof. Dewar's main points about today's graduates. (disclosure: I have a degree in Computing and Software Systems from the University of Washington. While Java was used for some things, the majority of my degree was in c++).

One of the main arguments that he makes is that Java has a lot of libraries. A lot. To quote Prof. Dewar as quoted from the article (not sure how to cite a quote from a quote correctly so just pretend that I did):

“If you go into a store and buy a Java book, it’s 1,200 pages; 300 pages are the language and 900 pages are miscellaneous libraries. And it is true that you can sort of cobble things together in Java very easily…so you can sort of throw things together with minimal knowledge,” he says. “But to me, that’s not software engineering, that’s some kind of consuming-level programming.”


Now this is absolutely true, and that's one of the strengths of Java is that there are a lot of libraries for everything. The same can be said of php, .Net, and perl. In fact, .Net has CLR implementations of functional programming as well (F#). There are also thousands of applications out there written in Java, .Net, php, and perl. True, none of them keep airplanes in flight or help launch the space shuttle. However, they do help trade stocks, manage companies, guard private health data, run military equipment, and create useless social -networking sites. So, what's the problem then? Is it that Java isn't as popular anymore in the business world? (Hint: no). Is it that Java is the wrong language to teach students how to program? (hint: no). Is it that we're not teaching computer science properly? (hint: warmer). Well, if you want my opinion, and since you're reading this blog I'm going to assume that you do, then my Answer-with-a-capital-A is:

WE'RE NOT TEACHING THE WRITE THINGS WITH THE RIGHT TOOLS!!!11!1!!11!one!1

Allow me to explain a bit.

Computer Science is too hard so only nerds can take that class

CS is intimidating for a lot of people. Computers are scary. Computers are complicated. Computers are for nerds who stare at the command prompt all day and never see the sun. These are all things that I've heard from non-CS majors.

Except that people in Math, Chemistry, and Physics need to take that class too

When I was in school, there were also a number of degrees that required the introductory computer programming classes (that's CSE 142 and the follow up cse 143 at the UW). I took the cse 142 equivalent class at community college when I was in high school through the running start program. The class was conducted in c (not c++) and I got credit for cse 142 by taking it as well as a year of science credits for my high school so that I could free up an extra period for a year to do nothing. The class was challenging but not overly so (in my opinion) and something that helped was that the class had about 20-25 people in it so there was a lot of opportunity for students to get individual help from the professor.

I took cse 143 at the UW my freshman year (way back in 1999 so now you all know how old I am) and it was in c++ at that time. I already knew c++ but that class was still challenging, even for me. I recall the first quiz we had, I got something like 44 out of 100 and I was still two full standard deviations above the mean. A lot of people dropped the class. I remember a project that was our first big exposure to objects and it was a dll-hell type situation and almost no one (including me) could get the code to actually build and link. The TA's couldn't get it to work. My friend Kevin who was a graduating senior in CS couldn't get it to work. The professor finally said that we should all just turn in what we had and if it didn't build or link, he'd grade more easily on this assignment. This almost made me hate computers. It did make a large number of students in that class say "fuck it, this sucks" and drop.

These are the problems that Java is trying to solve

Java doesn't have dll hell. It had an IDE that is well supported. It's free. It can be run all platforms easily enough and doesn't require special changes to the code to get it to build on different platforms. The syntax is reasonable friendly. It does have a lot of libraries and stuff, but it's still able to implement most data structures and algorithms that you'd commonly find in such classes, such as linked-lists, b-trees, heaps, hashtables, etc. as well as common searching/sorting algorithms. Now, students only have to worry about their code and making it work. There is no (or minimal) frustration with things like getting the damn compiler to work or worrying about environment. If a student likes solving problems in code, they may now choose to pursue that as a degree instead of getting overly frustrated with dealing with their build tool or IDE or whatever. The thoughts of the universities are clearly in making CS not look as intimidating at first and I think that Java solves this problem about as well as it can be solved.

How Java creates at least 10 new problems (that's 10 in like base 50)

The first problem is how students are taught using Java. Just because Java has 100000 different libraries doesn't mean that you have to use them. Ultimately, most of those libraries are written in Java, right? So that's the first problem: when you start teaching data structures and algorithms, you must actually teach them; you can't just let people use the libraries to build applications. Maybe a good example for teaching hashtables would be to show the Java HashTable class and write a working application that stores and retrieves values from one, but write that application against the Map interface (which hashtable implements). Now, write your own Hashtable class that implements Map and the output from your implementation and the Java HashTable class should be the same. Repeat for other data structures and algorithms. Wow, problem solved.

So now that the students learned Java, we can just do everything in Java, right?

Wrong. You just can't teach everything in Java. You can teach some things in Java. I had an operating systems class that was taught about 50% in Java (the other 50% was c++ on a linux kernel). It was taught in Java because illustrating concepts with multi-threading is much more complex in c++. Doesn't mean that I don't know how to multithread in c++, but it was much easier to debug this crap in Java, which let me focus on the concepts of an OS. The same is true of file IO- we had to implement a virtual file system, which Java made easy by handling the actual file IO for me, however we only got one really big file, and inside that file we had to have our inodes and our data and we had to create a "file" implementation that would simulate reading and writing to our big "file." Again, this helped me understand the concept without having to worry about formatting an actual hard drive and interfacing with it. Here's the point:

Java allows you to focus more on the concept that you're trying to study without having to spend a lot of time working on the tools and environment. If focusing on the concept is not dependent on the tools or environment, then Java is an acceptable choice.
Learning about memory management in a language that manages memory for you is hard

This is probably the biggest point in which Java breaks down as a teaching tool. In order to be a developer, you have to have a solid understanding of how the machine handles things like memory and what implications it has on your program. You don't have to worry about memory at all in Java, so this makes it an inadequate tool for the job. A language like C is really the best teaching tool here since you have to do everything manually. In fact, I think the best way to really understand how memory works is in assembly, where you can actually look at the addressing modes and see the difference between them. This helped me understand pointers more than anything else, which brings us to:

You can't learn about machine language without an actual machine

Java is a virtual machine, but we want to know about actual machines. Having a strong working knowledge of the principles behind how computers work is critical, especially when something goes terribly wrong. You probably won't ever use assembly again in your career after college. You will probably never write a driver or a compiler. However, if you are using these tools (yes, I consider a compiler to be a tool), it's important that you know generally how they work because if they ever don't work, you'll never be able to figure out why. You need to learn this by actually looking at hardware and how it's built. You should be able to design a relatively simple logic-based circuit. You should know how these circuits are used to make up a computer. These things aren't that hard if taught well (and I was taught well so thanks Arnie if you're reading this). Assembly language is how the hardware and software interface, so it's pretty important that you learn it as well. I learned motorola 68000 assembly which I think is much simpler than x86 assembly but still illustrates the points well. I now know the difference between
int a = 5
int* b = a
int* b = 5
is really in what assembly instructions are emitted (hint: it's the same instruction but the addressing mode changes). This helps me understand how memory works in programming, and that helps me to ensure that I write programs that don't leak memory (or references in the case of managed code because it's a similar concept).

And now, the things that are missing from the education system.

What I (and many other people) think is missing is a good foundation in object-oriented programming. Most people get a week or two and a homework assignment on polymorphism. That's cool, you understand inheritance not really. What they don't teach is WHY to use inheritance and how to use it correctly. There is nothing on patterns, or refactoring, or just generally how to program with objects. There needs to be, as I think this skill is critical to have and that most college grads don't have it because they were never taught it (I know I wasn't). I think that Java would be a good tool for teaching this (although obviously not the only one).

And then there was testing

No one teaches how to test code or even how to make code testable. I'm not talking about running your app and checking inputs and outputs. I'm talking about unit testing, integration testing, and the automation of those things. It's not enough to just know that you have to test or that unit tests are important. You need to understand things like test doubles, test automation, and how to write code that can be tested in isolation. Java would probably be a good language for this.

And then there was all the "other stuff"

So how do you handle building big projects? How do automate a build? How do you manage source code? What is a branch? These are all things that most developers who have experience with them take for granted, but we all had to learn somewhere. Probably we either figured it out by experiencing the problems that these things solve, or someone at our jobs showed us. This needs to start in schools. Don't focus on a specific technology for any of these but again teach the concepts and why they're important. The code isn't all that important in this type of class so I could argue that you could use any language you want. HOWEVER, remember the purpose of this class is managing code not writing it, so don't force students to do anything complex in the code. Use some existing code and make trivial changes to it that force the students to use version control and to change the build process to take into account the more complex stuff.

The final thought

A lot of universities stick with Java because the students already know it and it's the lowest common denominator. That's fine if you want your students to come out being the lowest common denominators in the world of developers. One critical skill that developers have is the ability to learn new languages, particularly since new languages are developed all the time. This helps stay competitive in the workforce as technology changes. If you just teach the whole thing in Java, then that's a problem because students never get the opportunity to figure out a new language rapidly.

So my solution?

  1. Teach every class in the most appropriate language for the subject. Intro classes should be taught in something that has a minimum of extra crap to make the programs compile and run. Java is really ideal for this but I would be OK with c# also. The point of this class is an intro to programming, not an intro into fucking with the compiler.
  2. At a minimum, each student should be required to work in at least four programming languages while in school, one of which should be assembly and one of which should be object-oriented. HTML is not a programming language.
  3. Teach how to write good code. Comments != good code. This should be enforced in every class but there needs to be a specific class in how to do this and it needs to happen early in a student's career. Class should cover things like patterns, principles of OO design, unit testing, etc.
  4. Require version control to be used by every student for every class past the intro classes. Universities should provide access to a university-run vcs for each student. This isn't as hard to do as it sounds.
  5. Compiler, Hardware, and Operating Systems classes should be mandatory (sometimes some of these are not). I wrote a disassembler in assembly language as a final project in hardware. It was hard but not impossible and everyone in the class got at least something that sorta worked. Mine could disassemble itself accurately.
  6. Students should be forced to collaborate with each other in every class. Collaboration might include working together, but could also include code reviews or paired programming.
  7. Don't ever force a student to have their code reviewed in front of the class unless the student is ok with it, but anonymous code review or review by the professor in a private setting is fine. I realize that the business world will not conform to this but this is school and we don't want to alienate students. I think this is a compromise that will still teach a code review's value and how to conduct one without making people want to drop out of the program (or worse).
  8. Every class should involve writing at least some code.
  9. Professors should provide at least one well-written piece of code that demonstrates something that the class is teaching. It's helpful for students to read good code. It's equally helpful for students to read bad code and know why it's bad.
Finally, if you're a professor, college administrator, or anything similar and you want to talk to me or anyone else in more detail about this, I'd be happy to chat with you any time. I only rant about this because I passionately believe that it's important and I will do everything in my power to try to make Computer Science education better. If you're reading this, I challenge you to make this a priority as well. Go talk to your local college. Email your professors. Go offer to talk to classes at your local schools, particularly at the high school and community college levels. Encourage people to be CS students. You never know what kind of influence you'll have on someone unless you do nothing.

Monday, July 21, 2008

Tag Soup sucks: Hey Jeff, here's a better way

Jeff Atwood of coding horror posted about "tag soup" in web development. I absolutely agree with him on this one: every web development framework currently in existence renders crap HTML code. Remember my HTML wall of shame? Yes, that's a good example of crap HTML being rendered by frameworks. Jeff (Atwood) asks if there's a better solution. Luckily, Jeff (me) has one: it's called writing good HTMLand separation of concerns in rendering. Wow, that's a long phrase. Let's try again: Don't use frameworks because you don't know HTML; the people who wrote the framework don't know HTML either. No, still not good. Let's stick with the old favorite:

It's called HTML and it's not hard

That's right, HTML is not a complex thing and writing clean HTML isn't particularly difficult. In fact, you can leverage a framework and still write good HTML and I'm going to show you how. It's really as simple as using separation of concerns. Let's analyze the various parts of a web page.

HTML

What is the HTML for? It's really a place to store content. Your text, your menu bars, your stupid scrolling marquees, your <blink> tags, etc. All of this goes here. The way I think about it is that you're using HTML tags to create containers for content. A <p> tag is a container for some text. A <table> contains some related data in rows and columns. A <span> tag is going to hold some special line of content. A <div> is going to contain some special stuff inside of it. Notice that I haven't said a thing about formatting, style, or actual content yet. The reason would be that it DOESN'T BELONG IN YOUR STUPID HTML!!!!! One more thing I'd like to bring up here is the hell of nested tables. This occurs when someone wants to do some sort of complex formatting and doesn't know how to use the div tag with CSS. Nested tables are an anti-pattern called "Nested fucking tables" and should be avoided. It won't make your formatting better (Firefox and IE sometimes render different table elements differently so often this actually makes things worse). This brings us to:


Formatting and Style


So wouldn't it be nice if there were some sort of "style" thing you could use to store all your styles so you can keep them in one place (DRY, right?). Maybe some sort of "sheet" where your "styles" could go, and then they would "cascade" throughout the whole site for every page that referenced them. Maybe some sort of "cascading style sheet?" Oh wait, that already exists. Let's use it! Now, you can focus on the HTML only be containers for content and let your CSS define how that content is presented and styled. Separation of concerns, right? Now you have only containers and maybe some information in the containers to identify them to your style sheets (ID and Class are the attributes you're looking for). This is good separation of concerns.


So what about behavior?


This is where the client side stuff comes in. Things get a little trickier here but not that tricky. Ok, I lied, it's not tricky at all if you actually know javascript and treat it as actual application code and not some bastardized client side tag-hiding-style-manipulating crap. Javascript is a language. It is subject to the same rules of all programming: Separation of concerns, DRY, IoC, etc. It should also have its own unit tests. Finally, like CSS, it should be extracted into its own file so that every page can consume it.

So now you have containers that can be identified, styles that can be applied to them, and scripts that can determine their behavior all in separate places. The ID's and classes of your containers help your styles and scripts know what to apply themselves to. There is a minimum of code that exists in your HTML that helps bind these things together, and in those pages that really, really are one-offs, you have inline styles and inline javascript (this should REALLY be the exception though).


But the server blah blah blah . . . .

This is where the example that Jeff shows really breaks down. I'm not going to post it up here, but go look at his post and check out the example code. You'll see something really stupidly obvious: YOU'RE DOING LOGIC IN THE DAMN MARKUP!!!! You're concatenating links, you're looping through stuff, you're doing all kinds of crap. Hell, as long as you're at it, why don't you query the database there too just so all your crap is at least in one file?

There is a simple solution to this problem: You already have containers defined by your html. Use them. Expose them to the server side code and let that code render stuff inside them. For example, in ASP .Net one of my favorite tricks is to have a table on my page and actually use the <asp:table> object so that my codebehind can expose it to my controller (You're using MVC, right?) and my controller can populate it with data. Wait, controllers shouldn't populate data, so wtf am I doing? Am I breaking my own rules? No, I don't directly populate tables from the controller; typically I use an intermediary object to do that for me (more about this in a future post, I promise). This way, the controller is able to provide the model to the view via some other object that is responsible for doing complex formatting. I can reuse my formatting objects where appropriate. I can also change the formatting without changing the model or the view itself. I can change the view even if I want to without caring how the formatting is created (as long as the contract between the view and my formatting object is fulfilled, i.e. if the view is expecting a table then the formatter had best be rendering one).


Here you go, Jeff-

A nice, happy, clean solution looks something like this:

1. The HTML provides containers for content and possibly some content as well.
2. The CSS provides style information and formatting for the containers.
3. Javascript manipulates the containers client-side to create a client-side view when neccessary
4. The server side code populates the HTML containers with content.
5. The server side uses helper objects to populate content that requires more complex rendering (tables with grouping levels I think are a good example here).

The only thing you need in order to pull this off is to know all of these different technologies. This isn't that hard, and as a web developer you really should know all of this stuff anyway. I think Microsoft started a horrible trend with Asp .Net that allowed application developers to write web apps without knowing anything about web technologies. This attitude has brought us the viewstate, page events, chatty controls, and a bunch of other crap that makes your html look like tag soup. Rails and MVC haven't helped this problem at all.