HARPy

One of my current clients is seeing a bit of wackiness when running scripts against their automation environment; when done manually things load at an acceptable speed, but slooooooow down when run through automation. Its our suspicion that there is something that is being loaded slowly when the WebDriver stack is involved. But how to debug?

Enter the HAR (HTTP Archive) file which gives you the timing information for every request that makes up a page load.

Step 1 – Generate the file

The easiest way to generate a HAR file with WebDriver is to run your session through the BrowserMob Proxy. This is a Python post so we’ll use David’s browsermob-proxy module.

<pre lang="bash">easy_install browsermobproxy

Once the egg is installed you can either start the server manually or in your script. (For the record, I prefer doing it outside of the script and controlling it with something like Puppet.)

<pre lang="python">from browsermobproxy import Server
server = Server("path/to/browsermob-proxy")
server.start()

# do stuff

server.stop()

With the server running you have two ways of creating the WebDriver instance.

Local WebDriver

<pre lang="python">from browsermobproxy import Client
from selenium import webdriver

proxy = Client("http://url.to.proxy:port")
profile  = webdriver.FirefoxProfile()
profile.set_proxy(proxy.selenium_proxy())
driver = webdriver.Firefox(firefox_profile=profile)

Remote Remote WebDriver

<pre lang="python">from browsermobproxy import Client
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium import webdriver

desired_capabilities = DesiredCapabilities.FIREFOX

client = Client("http://url.to.proxy:port")
client.add_to_webdriver_capabilities(desired_capabilities)

driver = webdriver.WebDriver(desired_capabilities = desired_capabilities

Now you’re routing all requests through the proxy regardless of the style of WebDriver you are using.

<pre lang="python">client.new_har("Example")

The new_har method takes an optional name for the page you are about to trap information for. If you don’t give it one it will just be called ‘Page 1’. Then you do stuff. And get the HAR from the proxy.

<pre lang="python">h = client.har

Step 2 – Interrogate the file

Yes, at this point you could look at the HAR Specification and call things out specifically, but its easier to use HARPy — a library I wrote for HAR files in Python.

<pre lang="bash">easy_install harpy

<pre lang="python">from harpy.har import Har

har = Har(h)

This har object follows the spec pretty closely but allows you to do the interrogations a little cleaner than you have otherwise. For instance checking for 404s

<pre lang="python">four_oh_fours = [e for e in har.entries if e.response.status == 404]

or for requests that took longer than 3 seconds

<pre lang="python">unacceptable_duration = [p for p in self.parsed.pages if p.timings[1] > 3000]

Step 3 – Py.Saunter (optional)

Unsurprisingly, Py.Saunter has grown new support for both the BrowserMob Proxy and for HAR files. If you are using Py.Saunter as your runner, this is how to hook everything up as of 0.43.

saunter.ini

<pre lang="text">[Proxy]
proxy_url: http://url.to.proxy:port
browsermob: true

The proxy client is available in the script as self.client and created for you automatically. Here is a full script which ties creating the HAR file together and making a success decision on the outcome.

<pre lang="python">@pytest.marks('shallow', 'ebay', 'har')
def test_har_retrieval(self):
    self.client.blacklist("http://www\\.facebook\\.com/.*", 404)
    self.client.blacklist("http://static\\.ak\\.fbcdn\\.com/.*", 404)
    self.client.new_har("shirts")
    s = ShirtPage(self.driver)
    s.go_to_mens_dress_shirts()
    h = self.client.har
    har = Har(h)
    four_oh_fours = [e for e in har.entries if e.response.status == 404]
    assert(len(four_oh_fours) == 1)

And also shows how to remove non-relevant 3rd party crap using the Python client. (Though normally you would likely want to have a 200 be injected as the HTTP Response.)

Doing this sort of integration is at least two years old, but is starting to reach the mainstream blog posts, mailing lists, etc. I suspect we’ll only be seeing some more of it in the next while until it is commonplace in another 18 months or so.