2008-06-25

XML: No, it isn't.

XML, well, just isn't. It's a raging misnomer. XML is, in theory, the eXtensible Markup Language. I have a couple of problems with that idea.

First, it's not extensible. It just isn't. You can't extend it. I can't extend it. No one can extend it. You know how I know? Because there aren't any extensions. Not a single one. Go ahead. Go find an extension to XML. I'd love to hear about it.

But it's just as well - there's no reason to extend it. XML defines very little; it's a syntax definition, nothing more. DTDs and Schemas are what make XML useful. They aren't extensions to XML, they're applications of XML. What's more, the DTDs and Schemas can be combined in a single document, but even they can't be extended.

Second, while it can be used as a markup language, it almost never is. XHTML is a markup language based on XML. There are a few others that are (debatably) markup languages, like DocBook, but even the likes of DocBook are more on the side of data structure definition than markup. A database file isn't markup. A Java properties file isn't markup. It's a data structure. Per Wikipedia:
A markup language is an artificial language using a set of annotations to text that describe how text is to be structured, laid out, or formatted.
Does that sound like most of the XML formats you've encountered? How many XML config files have you had to deal with? Do they fit that description, even a little bit? Of course not. You don't care about the structure, layout, or formatting of the text in a config file. All you care about is getting at the particular block of text you want. So, what is XML then? Something of a generic hierarchical data file format - though I suppose GHDFF just isn't as catchy as XML.

Now, besides being aggregiously misnamed, it's also a wretched tool for nearly every purpose to which it is applied. It's a language that aims for the middle ground between human-readable and machine-readable, and while it achieves both, it does so very poorly. XML is annoying to read, tedious to write, and resource-intensive to process.

I'm not suggesting dumping XML entirely, not at all. The angle-bracket tax is a fee worth paying for actual markup - you need syntax to seperate the markup from the text. XML is a flexible and effective format for marking up text. What it isn't is an effective format for storing arbitrary data. It's usable, but nowhere near optimal. What's the solution? Something else.

Programmers have a tendency to cling to standards, to try to apply them as much as possible. "Don't reinvent the wheel," we say. And that's a perfectly reasonable mantra - but that doesn't mean all wheels are created equal. When's the last time you saw a bicycle wheel on a car? Would the world be a better place if every wheel were the same? Sure, they'd be interchangeable - but they wouldn't be anywhere near as effective.

We need to step back sometimes, and think about whether there is, or could be, a better wheel for any given situation.

More on this to come.