Application Phone Home
Too often when an application is released into production, the stream of information about it comes to an abrupt halt. Or sometimes it starts to hide in the shadows.
When I was Points, we had the system configured in such a way that applications would email exceptions to product specific mailboxes. I then had to go through these mailboxes and compile the list of exceptions and number of duplicates (which took about half a day at the beginning of each month).
In what seemed a throwaway comment in an email thread I was on today, someone mentioned that their application automatically logs any exceptions that happen in production as issues in their bug tracking system. (And apparently, since it is FogBugz, it does magic regex checks of existing stack traces and will increase an occurrence counter instead of logging a duplicate). This is a seriously cool, if not new trick and I think this is the next step of having applications report on their ongoing health.
The key evolutionary point is that the totals are derived from the bug system automatically and in the email context, you had enter them in the bug system anyways once you decided something was going to be addressed. You also replace a separate source of information with a more consolidated view which is always a good thing.
I’m starting to think that this type of phone home functionality should be part of any ‘supportability’ review. This is easy to implement on the application side of things if you support your own servers (in a SaS or pure web type application) but things might be a bit hairy if your customers are the ones hosting the server and application if they are paranoid about information leakage. The bug system also needs to be able to support this. Windows XP and newer also have a similar feature when an application crashes.
As mentioned above, apparently FogBugz has it, and I would be surprised if Bugzilla or Jira didn’t have this functionality somewhere as well.
But of course, if there is no commitment from management to address exceptions that are happening in production, then all of the setup necessary to implement this is wasted.