Python, Postgres and some Monkeypatching (just for kicks)

Greg Wilson has ranted a number of times about how hard package management is both from a teaching perspective and a get-the-blasted-thing-wired-properly-on-a-given-machine one. I had more or less written this rant off as it had never happened to me — until now.

Enter Postgres.

Part of doing effective automation is verifying what you see in the browser is actually what is in the database. Caching, [accidental] data transformations and unintentional mocking of important services have taught me this the hard way, so trust me on this one. Being able to query the database is also exceptionally useful for test data selection and/or disqualification. Of course, in order to do this you need some sort of driver to access your database.

Using databases in Python should be easy, and generally is since the interface is standardized. Installing the driver to use SQLite is easy; it comes with Python. Postgres is a different beast entirely. First, there are 6 or 7 drivers available, all with different levels of freshness and features. And if you choose the ones that are under current development you enter a whole new world of pain — C. Don’t have a compiler on your machine? Well, go fetch one. Don’t have Postgres installed in a standard location? Well, go hack the Makefile a bit. And then cross your fingers.

And swear.

A lot.

Thankfully there appears to be a way out of this mess, and that is to use a pure Python driver in the form of py-bpgsql. Which, of course, has not had a commit since June 8, 2009 and very little in the way of documentation aside from reading the code. (Oh, and is LGPL; blech.). That said, because it is pure python, it installs on a machine without the full suite of development tools.

<pre lang="bash">sudo easy_install bpgsql

Once you have this on your system, it is time to integrate it into you automation. I wrap things that access external services or generate data somehow in what I call Providers. So in this case what I want to do is create a PostgresProvider class and stuff it somewhere. (In Py.Saunter they live in modules/providers.)

<pre lang="python">import bpgsql
import saunter.ConfigWrapper
cf = saunter.ConfigWrapper.ConfigWrapper().config

class PostgresProvider(object):
    def __init__(self, host = None, port = None, username = None, password = None, dbname = None):
        if not host:
            host = cf.get("Postgres", "host")
        if not port:
            port = cf.getint("Postgres", "port")
        if not username:
            username = cf.get("Postgres", "username")
        if not password:
            password = cf.get("Postgres", "password")
        if not dbname:
            dbname = cf.get("Postgres", "db")

        self.connection = bpgsql.connect(host = host, port = port, \
                                         username = username, password = password, \
                                         dbname = dbname)
if __name__ == '__main__':
    pp1 = PostgresProvider(host = "db.company.com", port = 5444, \
                          username = "username", password = "password", \
                          dbname = "database")
    print(pp1.get_payroll_record_type(529611))

    pp2 = PostgresProvider()
    print(pp2.get_payroll_record_type(529611))

bpgsql.connect can also take a DSN string with all the details for accessing the server, but my preference is to explicitly pass in things. This snippet also reads from a config file some values rather than hard code them into the class. This is a really good idea so you don’t have to change the code for every environment you run against, only some value in a config file.

With this in place you can start to fill in the methods that you are going to use to access the database such as get_payroll_record_type which the astute reader will notice I used for my testing.

<pre lang="python">def get_order_details(self, order_number):
    cur = self.connection.cursor()
    cur.execute("select * from job_order where order_id = %s" % (order_number,))
    record = cur.fetchall()
    cur.close()
    return record[0]
    
def get_payroll_record_type(self, order_number):
    details = self.get_order_details(order_number)
    if details[0] == 0:
        return "detailed"
    return "regular"

And now your laughing all the way to the bank. Except you’re not. Prepare for another round of swearing. (Or at least I had to.)

Either the author of bpgsql did not test the module against a variety of servers, the method for grabbing time related fields has changed in the last three years the client I am working with right now has a ‘custom’ way of handling time in the database. Or all three. (Or more that I haven’t thought of yet.). But for whatever reason, I needed to monkeypatch around three problems — you might have to add to the list.

Monkeypatching in Python is frowned in Python to the same degree as it is revered in Ruby, but it is still pretty easy. The basic steps are

import something
write what you want to replace it with
stomp on the imported version by assigning it overtop

But unlike Ruby, the stompage is local to the module so if you need the patch elsewhere you should move it to a separate file/module and access it via that wrapper. I would however consider having two ways into the database from your scripts actually a smell so I’m just putting my mokeypatches above the actual PostgresProvider class. Here are each of them.

The first one relates to how the time information is returned from the database. Ends up that it was expecting 28-12-09 00:00:00 but got 28-DEC-09 00:00:00. DEC of course cannot be converted to an int and was causing an exception.

<pre lang="python">months = {"JAN": 1,
          "FEB": 2,
          "MAR": 3,
          "APR": 4,
          "MAY": 5,
          "JUN": 6,
          "JUL": 7,
          "AUG": 8,
          "SEP": 9,
          "OCT": 10,
          "NOV": 11,
          "DEC": 12}

def _date_to_python(s):
    y, m, d = s.split('-')
    if m in months:
        m = months[m]
    return datetime.date(int(y), int(m), int(d))

bpgsql._date_to_python = _date_to_python

With that in place I got a new error message (progress!). This time saying something along the lines of there being too many items to unpack. Turned out that timestamp fields were including the time offset information in its value as third ‘chunk’ causing

<pre lang="python">(datepart, timepart) = s.split(" ")

to fail. So I switched it with a slightly more forgiving slicing method.

<pre lang="python">def _timestamp_to_python(s):
    datepart = s[:s.index(' ')].strip()
    timepart = s[s.index(' ') + 1:]
    d = bpgsql._date_to_python(datepart)
    t = bpgsql._time_to_python(timepart)
    return datetime.datetime(d.year, d.month, d.day,
        t.hour, t.minute, t.second,
        t.microsecond, t.tzinfo)

bpgsql._timestamp_to_python = _timestamp_to_python

Only to encounter yet another problem. It looks like the author of bpgsql took into account the offset, but only ones that are + something. If you are – (as I am) the the bit where the library trying to subtract a string from 0 fails. Correctly.

<pre lang="python">class _SimpleTzInfo(datetime.tzinfo):
    """
    Concrete subclass of datetime.tzinfo that can represent
    the hour and minute offsets PgSQL supplies in the
    '... with time zone' types.

    """
    def __init__(self, tz):
        super(_SimpleTzInfo, self).__init__()
        if ':' in tz:
            hour, minute = tz.split(':')
        else:
            hour = tz
            minute = 0
        hour = int(hour)
        minute = int(minute)
        if hour 
<p>All that was necessary in this one was the explicit casting to an integer.</p>
<p>Finally, with these three monkeypatches in place, I am able to do things like 'double click a row in the browser that represents a regular timecard, from California, in an overtime exempt job category that is below the legislated amount'. Which I couldn't do without access to the database.</p>