I’ve used WordPress for this blog for over 3 years and have recommended it to countless people. It’s interface is quite usable and its 3rd party infrastructure is such that I’ve never really had to extend it more than just tweaking a template here and there. In short, I was quite happy with it.

But in the last month, that charm has faded. A lot. What happened? I had to make it jump through a hoop. In theory it should have been a nice, pliable hula-hoop, but instead it turned out to be an electric hoop, with spikes, and on fire.

WordPress can import posts from a number of different blogging platforms, one of them being another WordPress installation. I needed to take content from an internally developed cms-ish platform, store it in the WordPress format and then import it into a real WordPress installation. This should have been as easy as:

  1. Figure out how to extract information from current system
  2. Lookup the WordPress format in the Codex (the WordPress documentation)
  3. Import

What it ended up being was a lot of red herrings and gnashing of teeth.

And a re-enforcement of some ideas I’ve been carrying around…

  • Document, document, document – Documenting how to use code is a lot less fun than creating the code in the first place. The WordPress eXtended RSS (WXR) appears to be a giant black hole of information. What little there is exists solely in the blogs of people who have had to do a similar exercise as I have just done. I don’t care if it is generated by WordPress and so ‘no one will need to write it by hand’. If it is publicly accessible, document it. Accurately.
  • XML describes your data – The big idea to wrap your head around with dealing with XML is that it describes your data versus HTML which formats you data. What this means is that your XML can be presented as one giant blob or with nicely structured and it will be considered by the application as the same thing. This magic is possible because you parse an XML file, you do not regex each line as an individual line. But that is exactly what WordPress does. This means that you have to have, for instance, each category element on a single line, otherwise it doesn’t get imported. It also means that the actual order of things inside the file matters. PHP has an XML parser in the base language; use it. Heck, once you have XML being treated as XML you can do things like validate against a schema which goes a long way to help Document the format.
  • Consistency Counts – The consistency heuristic is one of the more powerful ones we use to find bugs. It is also one of the more important ones when you are looking at extending your system. There are 14 different systems WordPress can import from, and there seems to be at least 3 or 4 different ways of doing it. I understand that these were user contributed, but a number of communities dictate what patches need to look like before acceptance. The creation, announcement and promotion of consistency hoops is not only a good idea, it is a fundamental one.
  • Provide actionable error messages – Thank you for telling me that there was an error do to a configuration setting, but unless you tell me which error message and where I can find it, you might as well be silently failing. I lost a couple hours tracking down 5 or 6 settings and then figuring out where they were being overwritten so as to not take my changes.
  • Do not munge – Unless I explicitly say ‘mess around with my data’, do not change it. Turning string data to a consistent, known case (upper or lower) is a strategy I teach in my scripting classes to find values without having to worry about case. But when you use the data you found, use the original data. WordPress however takes your remote attachments (such as images) and lowercases them on import, regardless of how it specified in the import file. This in innocuous enough on Windows and Mac, but on enterprise unixes where case matters, you get a tonne of 404s.

Now, in the took-far-too-long run, I was able to succeed by reading the import and export code of WordPress which illustrates the utility of not only having access to the code, but knowing how to read it. But by having to dig into the guts of WordPress, its sheen has taken quite a tarnish.