Monday, July 21, 2008

Tag Soup sucks: Hey Jeff, here's a better way

Jeff Atwood of coding horror posted about "tag soup" in web development. I absolutely agree with him on this one: every web development framework currently in existence renders crap HTML code. Remember my HTML wall of shame? Yes, that's a good example of crap HTML being rendered by frameworks. Jeff (Atwood) asks if there's a better solution. Luckily, Jeff (me) has one: it's called writing good HTMLand separation of concerns in rendering. Wow, that's a long phrase. Let's try again: Don't use frameworks because you don't know HTML; the people who wrote the framework don't know HTML either. No, still not good. Let's stick with the old favorite:

It's called HTML and it's not hard

That's right, HTML is not a complex thing and writing clean HTML isn't particularly difficult. In fact, you can leverage a framework and still write good HTML and I'm going to show you how. It's really as simple as using separation of concerns. Let's analyze the various parts of a web page.


What is the HTML for? It's really a place to store content. Your text, your menu bars, your stupid scrolling marquees, your <blink> tags, etc. All of this goes here. The way I think about it is that you're using HTML tags to create containers for content. A <p> tag is a container for some text. A <table> contains some related data in rows and columns. A <span> tag is going to hold some special line of content. A <div> is going to contain some special stuff inside of it. Notice that I haven't said a thing about formatting, style, or actual content yet. The reason would be that it DOESN'T BELONG IN YOUR STUPID HTML!!!!! One more thing I'd like to bring up here is the hell of nested tables. This occurs when someone wants to do some sort of complex formatting and doesn't know how to use the div tag with CSS. Nested tables are an anti-pattern called "Nested fucking tables" and should be avoided. It won't make your formatting better (Firefox and IE sometimes render different table elements differently so often this actually makes things worse). This brings us to:

Formatting and Style

So wouldn't it be nice if there were some sort of "style" thing you could use to store all your styles so you can keep them in one place (DRY, right?). Maybe some sort of "sheet" where your "styles" could go, and then they would "cascade" throughout the whole site for every page that referenced them. Maybe some sort of "cascading style sheet?" Oh wait, that already exists. Let's use it! Now, you can focus on the HTML only be containers for content and let your CSS define how that content is presented and styled. Separation of concerns, right? Now you have only containers and maybe some information in the containers to identify them to your style sheets (ID and Class are the attributes you're looking for). This is good separation of concerns.

So what about behavior?

This is where the client side stuff comes in. Things get a little trickier here but not that tricky. Ok, I lied, it's not tricky at all if you actually know javascript and treat it as actual application code and not some bastardized client side tag-hiding-style-manipulating crap. Javascript is a language. It is subject to the same rules of all programming: Separation of concerns, DRY, IoC, etc. It should also have its own unit tests. Finally, like CSS, it should be extracted into its own file so that every page can consume it.

So now you have containers that can be identified, styles that can be applied to them, and scripts that can determine their behavior all in separate places. The ID's and classes of your containers help your styles and scripts know what to apply themselves to. There is a minimum of code that exists in your HTML that helps bind these things together, and in those pages that really, really are one-offs, you have inline styles and inline javascript (this should REALLY be the exception though).

But the server blah blah blah . . . .

This is where the example that Jeff shows really breaks down. I'm not going to post it up here, but go look at his post and check out the example code. You'll see something really stupidly obvious: YOU'RE DOING LOGIC IN THE DAMN MARKUP!!!! You're concatenating links, you're looping through stuff, you're doing all kinds of crap. Hell, as long as you're at it, why don't you query the database there too just so all your crap is at least in one file?

There is a simple solution to this problem: You already have containers defined by your html. Use them. Expose them to the server side code and let that code render stuff inside them. For example, in ASP .Net one of my favorite tricks is to have a table on my page and actually use the <asp:table> object so that my codebehind can expose it to my controller (You're using MVC, right?) and my controller can populate it with data. Wait, controllers shouldn't populate data, so wtf am I doing? Am I breaking my own rules? No, I don't directly populate tables from the controller; typically I use an intermediary object to do that for me (more about this in a future post, I promise). This way, the controller is able to provide the model to the view via some other object that is responsible for doing complex formatting. I can reuse my formatting objects where appropriate. I can also change the formatting without changing the model or the view itself. I can change the view even if I want to without caring how the formatting is created (as long as the contract between the view and my formatting object is fulfilled, i.e. if the view is expecting a table then the formatter had best be rendering one).

Here you go, Jeff-

A nice, happy, clean solution looks something like this:

1. The HTML provides containers for content and possibly some content as well.
2. The CSS provides style information and formatting for the containers.
3. Javascript manipulates the containers client-side to create a client-side view when neccessary
4. The server side code populates the HTML containers with content.
5. The server side uses helper objects to populate content that requires more complex rendering (tables with grouping levels I think are a good example here).

The only thing you need in order to pull this off is to know all of these different technologies. This isn't that hard, and as a web developer you really should know all of this stuff anyway. I think Microsoft started a horrible trend with Asp .Net that allowed application developers to write web apps without knowing anything about web technologies. This attitude has brought us the viewstate, page events, chatty controls, and a bunch of other crap that makes your html look like tag soup. Rails and MVC haven't helped this problem at all.


NotMyself said...

Wow... I'll take misdirected hostility for a thousand, Alex!

Jeff Tucker said...

misdirected hostility would make a good band name.

Seriously though, I'm not directing this at Jeff Atwood (although I think I make the argument that he should have thought of this solution) but instead at everyone who uses frameworks as a crutch because they don't know HTML. I realize that this is a huge group of people.

Jani Hartikainen said...

While I think you are on a correct track here for most part, I don't see how frameworks are a culprit and how ASP controls are infinitely better.

Frameworks don't necessarily generate HTML for you. If you look at Jeff's example (which is horrible, yes), you don't see anything generating any markup. Maybe you should do some research before blaming things.

ASP.NET controls are known for the fact that they (at least used to) generate shitty markup. While they solve the problem of mixing code and HTML, they have a set of their own. One being the fact that it's not as easy to modify the markup the control outputs, compared to editing a template.

Jeff Tucker said...

The ASP controls are just one way to expose containers to your server-side controls. I feel that when used properly, they are a lot better than either using a bunch of "if/else" statements to send only parts of your markup (e.g. php, classic ASP, etc) or just using Response.Write or echo or printf inline in your markup.

I like the Table object because I find that in recent versions of .Net that the Table and its associated TableRow and TableCell objects tend to emit exactly the HTML code that you would expect them to based on the properties that you set. In cases where this doesn't work (which is anywhere a table is not appropriate) I tend to use the PlaceHolder control. The Placeholder allows me to either add controls inside of it that will render (labels and textboxes are good candidates here) or I can use an HtmlLiteralControl which will emit whatever is in its Text property; this allows me to have any HTML in there that I feel like having. I could Response.Write my own HTML too if I wanted to but that's often not very clean in the server-side code (IMHO).

I have never used a datagrid, repeater, or calendar control before due to the fact that they emit crap HTML, don't work very well outside of IE, and are very difficult to customize (try doing grouping in a datagrid; it's possible but extremely difficult).

I think I made a mistake in linked framework code generation specifically to Jeff's example, but my point was that frameworks which DO generate HTML for you tend to generate crap HTML, either dynamically at runtime (ASP .Net), at design time through code generation (Rails Scaffolding, ASP .Net), or through templates (Microsoft Dynamic Data, unless they've changed it since the last demo I saw). People use this generation as a crutch because they don't understand HTML and instead of learning it (like they should) they rely on crap code generation instead and try to work with that, which is the road to failure and tag soup.

Anonymous said...

Can you provide a practical example here? I'd love to see sample code to demonstrate exactly what you mean. I'm currently writing a site (on top of my own MVC framework; mostly the site is an excuse to practice writing a framework from scratch) and I've got my CSS files, my JS files, my views, my models, my controllers... what do you mean precisely by having the server-side code write content into the HTML containers?

Anonymous said...

Jeff-you-bastard, email me.
-- J@JustinAngel.Net

Jeff Tucker said...

I actually had a great example from one of the last things I did at my old job. I'll come up with something contrived soon and post a blog (or maybe a series) where I go over it. I'm thinking about writing some crap tag soup and then refactoring it towards happiness, would that be useful or would you prefer a good example first that became more complex over time (or both)?

Terrence Brannon said...

Tag soup can only occur in pull-style templating. The solution is push-style templating -