There are no exceptions to Bray’s Law
I’m totally with Tim Bray on this. If you can’t be bothered to make a syndication feed that’s well-formed XML, than you are an incompetent fool. From now on this shall be known as ‘Bray’s Law’. There are no exceptions to Bray’s Law.
Mark Pilgrim thinks otherwise, according to him (and others), there are no exceptions to Postel’s Law.
Just as an example of what can go wrong when you try to parse XML that’s not well-formed, let’s have a look at Mark Pilgrim’s ultra-liberal feed parser. Up until version version 2.5.3, this parser couldn’t make sense of the RSS image and textInput elements.
For example, let’s take the following valid RSS feed (for clarity, I’ve removed some elements from the listing below):
<rss>
<channel>
<description>Tweakers.net is Nederlands grootste informatie en community site voor tweakers.</description>
<link>http://www.tweakers.net/</link>
<title>Tweakers.net</title>
<item>
<title>Zonnet verhoogt snelheid Breedband Plus-abonnement</title>
<link>http://www.tweakers.net/nieuws/26619</link>
<description>Zonnet verhoogt snelheid Breedband Plus-abonnement - 22-04-2003 00:51 door Kevin Levie</description>
</item>
<textInput>
<title>Zoeken</title>
<description>Enter your search terms</description>
<name>Query</name>
<link>http://www.tweakers.net/search?DB=Nieuws</link>
</textInput>
</channel>
</rss>
In its ultra-liberalness, the parser takes the last title, description and link element, that’s not inside an item element as the feed’s title, description and link. In this cases it returned ‘Zoeken’ as the title, ‘Enter your search terms’ as the description and ‘http://www.tweakers.net/search?DB=Nieuws’ as the link. This is clearly wrong. With proper XML parsing and processing tools (like DOM and XPath) this bug would probably never have been there.
Another problem with the ultra-liberal parser is that it just returns the bytes it finds inside each elements, without any regard for the character encoding of the source document. Up until version 2.5.3 you couldn’t even get the character encoding of the source document, so there was no way to know if the values the parser returned were encoded as plain ascii, iso-8859-1 or utf-8.
With well-formed XML, you don’t ever have to think about the character encoding of the source document. The XML parser handles most of the the gory details for you.
The ultra-liberal feed parser is an impressive piece of software that handles the problem of parsing invalid feeds really well. Now keep in mind that the first ‘S’ in RSS stands for ‘Simple’ and have a look at the source code.
XML has been created to make it easier to exchange data. This only works if you at least stick to the few simple rules for creating well-formed XML. If you can’t be bothered to do that, you are a fool. We have better things to do with our time than to write software that rewards fools.