Tuesday, July 29, 2008

Teaching Java in school is just as controversial as an interview with Justice Gray

I read a great article today here that was a follow up to an interview here with a Computer Science professor Robert Dewar of New York University. I'll stop for a moment while you go read the articles. Both of them. Like I did. All done? Good, now we can have an intelligent conversation about them, and I have to say that I absolutely agree with Prof. Dewar's main points about today's graduates. (disclosure: I have a degree in Computing and Software Systems from the University of Washington. While Java was used for some things, the majority of my degree was in c++).

One of the main arguments that he makes is that Java has a lot of libraries. A lot. To quote Prof. Dewar as quoted from the article (not sure how to cite a quote from a quote correctly so just pretend that I did):

“If you go into a store and buy a Java book, it’s 1,200 pages; 300 pages are the language and 900 pages are miscellaneous libraries. And it is true that you can sort of cobble things together in Java very easily…so you can sort of throw things together with minimal knowledge,” he says. “But to me, that’s not software engineering, that’s some kind of consuming-level programming.”


Now this is absolutely true, and that's one of the strengths of Java is that there are a lot of libraries for everything. The same can be said of php, .Net, and perl. In fact, .Net has CLR implementations of functional programming as well (F#). There are also thousands of applications out there written in Java, .Net, php, and perl. True, none of them keep airplanes in flight or help launch the space shuttle. However, they do help trade stocks, manage companies, guard private health data, run military equipment, and create useless social -networking sites. So, what's the problem then? Is it that Java isn't as popular anymore in the business world? (Hint: no). Is it that Java is the wrong language to teach students how to program? (hint: no). Is it that we're not teaching computer science properly? (hint: warmer). Well, if you want my opinion, and since you're reading this blog I'm going to assume that you do, then my Answer-with-a-capital-A is:

WE'RE NOT TEACHING THE WRITE THINGS WITH THE RIGHT TOOLS!!!11!1!!11!one!1

Allow me to explain a bit.

Computer Science is too hard so only nerds can take that class

CS is intimidating for a lot of people. Computers are scary. Computers are complicated. Computers are for nerds who stare at the command prompt all day and never see the sun. These are all things that I've heard from non-CS majors.

Except that people in Math, Chemistry, and Physics need to take that class too

When I was in school, there were also a number of degrees that required the introductory computer programming classes (that's CSE 142 and the follow up cse 143 at the UW). I took the cse 142 equivalent class at community college when I was in high school through the running start program. The class was conducted in c (not c++) and I got credit for cse 142 by taking it as well as a year of science credits for my high school so that I could free up an extra period for a year to do nothing. The class was challenging but not overly so (in my opinion) and something that helped was that the class had about 20-25 people in it so there was a lot of opportunity for students to get individual help from the professor.

I took cse 143 at the UW my freshman year (way back in 1999 so now you all know how old I am) and it was in c++ at that time. I already knew c++ but that class was still challenging, even for me. I recall the first quiz we had, I got something like 44 out of 100 and I was still two full standard deviations above the mean. A lot of people dropped the class. I remember a project that was our first big exposure to objects and it was a dll-hell type situation and almost no one (including me) could get the code to actually build and link. The TA's couldn't get it to work. My friend Kevin who was a graduating senior in CS couldn't get it to work. The professor finally said that we should all just turn in what we had and if it didn't build or link, he'd grade more easily on this assignment. This almost made me hate computers. It did make a large number of students in that class say "fuck it, this sucks" and drop.

These are the problems that Java is trying to solve

Java doesn't have dll hell. It had an IDE that is well supported. It's free. It can be run all platforms easily enough and doesn't require special changes to the code to get it to build on different platforms. The syntax is reasonable friendly. It does have a lot of libraries and stuff, but it's still able to implement most data structures and algorithms that you'd commonly find in such classes, such as linked-lists, b-trees, heaps, hashtables, etc. as well as common searching/sorting algorithms. Now, students only have to worry about their code and making it work. There is no (or minimal) frustration with things like getting the damn compiler to work or worrying about environment. If a student likes solving problems in code, they may now choose to pursue that as a degree instead of getting overly frustrated with dealing with their build tool or IDE or whatever. The thoughts of the universities are clearly in making CS not look as intimidating at first and I think that Java solves this problem about as well as it can be solved.

How Java creates at least 10 new problems (that's 10 in like base 50)

The first problem is how students are taught using Java. Just because Java has 100000 different libraries doesn't mean that you have to use them. Ultimately, most of those libraries are written in Java, right? So that's the first problem: when you start teaching data structures and algorithms, you must actually teach them; you can't just let people use the libraries to build applications. Maybe a good example for teaching hashtables would be to show the Java HashTable class and write a working application that stores and retrieves values from one, but write that application against the Map interface (which hashtable implements). Now, write your own Hashtable class that implements Map and the output from your implementation and the Java HashTable class should be the same. Repeat for other data structures and algorithms. Wow, problem solved.

So now that the students learned Java, we can just do everything in Java, right?

Wrong. You just can't teach everything in Java. You can teach some things in Java. I had an operating systems class that was taught about 50% in Java (the other 50% was c++ on a linux kernel). It was taught in Java because illustrating concepts with multi-threading is much more complex in c++. Doesn't mean that I don't know how to multithread in c++, but it was much easier to debug this crap in Java, which let me focus on the concepts of an OS. The same is true of file IO- we had to implement a virtual file system, which Java made easy by handling the actual file IO for me, however we only got one really big file, and inside that file we had to have our inodes and our data and we had to create a "file" implementation that would simulate reading and writing to our big "file." Again, this helped me understand the concept without having to worry about formatting an actual hard drive and interfacing with it. Here's the point:

Java allows you to focus more on the concept that you're trying to study without having to spend a lot of time working on the tools and environment. If focusing on the concept is not dependent on the tools or environment, then Java is an acceptable choice.
Learning about memory management in a language that manages memory for you is hard

This is probably the biggest point in which Java breaks down as a teaching tool. In order to be a developer, you have to have a solid understanding of how the machine handles things like memory and what implications it has on your program. You don't have to worry about memory at all in Java, so this makes it an inadequate tool for the job. A language like C is really the best teaching tool here since you have to do everything manually. In fact, I think the best way to really understand how memory works is in assembly, where you can actually look at the addressing modes and see the difference between them. This helped me understand pointers more than anything else, which brings us to:

You can't learn about machine language without an actual machine

Java is a virtual machine, but we want to know about actual machines. Having a strong working knowledge of the principles behind how computers work is critical, especially when something goes terribly wrong. You probably won't ever use assembly again in your career after college. You will probably never write a driver or a compiler. However, if you are using these tools (yes, I consider a compiler to be a tool), it's important that you know generally how they work because if they ever don't work, you'll never be able to figure out why. You need to learn this by actually looking at hardware and how it's built. You should be able to design a relatively simple logic-based circuit. You should know how these circuits are used to make up a computer. These things aren't that hard if taught well (and I was taught well so thanks Arnie if you're reading this). Assembly language is how the hardware and software interface, so it's pretty important that you learn it as well. I learned motorola 68000 assembly which I think is much simpler than x86 assembly but still illustrates the points well. I now know the difference between
int a = 5
int* b = a
int* b = 5
is really in what assembly instructions are emitted (hint: it's the same instruction but the addressing mode changes). This helps me understand how memory works in programming, and that helps me to ensure that I write programs that don't leak memory (or references in the case of managed code because it's a similar concept).

And now, the things that are missing from the education system.

What I (and many other people) think is missing is a good foundation in object-oriented programming. Most people get a week or two and a homework assignment on polymorphism. That's cool, you understand inheritance not really. What they don't teach is WHY to use inheritance and how to use it correctly. There is nothing on patterns, or refactoring, or just generally how to program with objects. There needs to be, as I think this skill is critical to have and that most college grads don't have it because they were never taught it (I know I wasn't). I think that Java would be a good tool for teaching this (although obviously not the only one).

And then there was testing

No one teaches how to test code or even how to make code testable. I'm not talking about running your app and checking inputs and outputs. I'm talking about unit testing, integration testing, and the automation of those things. It's not enough to just know that you have to test or that unit tests are important. You need to understand things like test doubles, test automation, and how to write code that can be tested in isolation. Java would probably be a good language for this.

And then there was all the "other stuff"

So how do you handle building big projects? How do automate a build? How do you manage source code? What is a branch? These are all things that most developers who have experience with them take for granted, but we all had to learn somewhere. Probably we either figured it out by experiencing the problems that these things solve, or someone at our jobs showed us. This needs to start in schools. Don't focus on a specific technology for any of these but again teach the concepts and why they're important. The code isn't all that important in this type of class so I could argue that you could use any language you want. HOWEVER, remember the purpose of this class is managing code not writing it, so don't force students to do anything complex in the code. Use some existing code and make trivial changes to it that force the students to use version control and to change the build process to take into account the more complex stuff.

The final thought

A lot of universities stick with Java because the students already know it and it's the lowest common denominator. That's fine if you want your students to come out being the lowest common denominators in the world of developers. One critical skill that developers have is the ability to learn new languages, particularly since new languages are developed all the time. This helps stay competitive in the workforce as technology changes. If you just teach the whole thing in Java, then that's a problem because students never get the opportunity to figure out a new language rapidly.

So my solution?

  1. Teach every class in the most appropriate language for the subject. Intro classes should be taught in something that has a minimum of extra crap to make the programs compile and run. Java is really ideal for this but I would be OK with c# also. The point of this class is an intro to programming, not an intro into fucking with the compiler.
  2. At a minimum, each student should be required to work in at least four programming languages while in school, one of which should be assembly and one of which should be object-oriented. HTML is not a programming language.
  3. Teach how to write good code. Comments != good code. This should be enforced in every class but there needs to be a specific class in how to do this and it needs to happen early in a student's career. Class should cover things like patterns, principles of OO design, unit testing, etc.
  4. Require version control to be used by every student for every class past the intro classes. Universities should provide access to a university-run vcs for each student. This isn't as hard to do as it sounds.
  5. Compiler, Hardware, and Operating Systems classes should be mandatory (sometimes some of these are not). I wrote a disassembler in assembly language as a final project in hardware. It was hard but not impossible and everyone in the class got at least something that sorta worked. Mine could disassemble itself accurately.
  6. Students should be forced to collaborate with each other in every class. Collaboration might include working together, but could also include code reviews or paired programming.
  7. Don't ever force a student to have their code reviewed in front of the class unless the student is ok with it, but anonymous code review or review by the professor in a private setting is fine. I realize that the business world will not conform to this but this is school and we don't want to alienate students. I think this is a compromise that will still teach a code review's value and how to conduct one without making people want to drop out of the program (or worse).
  8. Every class should involve writing at least some code.
  9. Professors should provide at least one well-written piece of code that demonstrates something that the class is teaching. It's helpful for students to read good code. It's equally helpful for students to read bad code and know why it's bad.
Finally, if you're a professor, college administrator, or anything similar and you want to talk to me or anyone else in more detail about this, I'd be happy to chat with you any time. I only rant about this because I passionately believe that it's important and I will do everything in my power to try to make Computer Science education better. If you're reading this, I challenge you to make this a priority as well. Go talk to your local college. Email your professors. Go offer to talk to classes at your local schools, particularly at the high school and community college levels. Encourage people to be CS students. You never know what kind of influence you'll have on someone unless you do nothing.

2 comments:

Helephant said...

I think the thing to realise is that when you finish university, graduate and become a working developer you're not finished. It's really only the start of a life time of learning and mastering the art of software development.

I kind of think it's like getting a driver's license. During the lessons you learn to use the car and learn the sort of habits that you need to become a good driver but you don't really learn to drive by yourself until you're out on the road on your own. I think there comes a point where more theory and practise just doesn't help any more until you've got enough practical experience to really understand what you've been learning.

I actually quite like the idea of apprenticeships where you can learn the practical things like source control from actually building real things at the same time as learning all the important theory stuff in a class room setting. I did a year's industry placement in the middle of my degree and I know it changed the last year of my degree because I really understood why the things that I was learning were so important.

Anonymous said...

Study some history. The old guard always complains. Yet somehow we all survive. Disclosure - I am a 49 yo C & assembly level programmer.