Analytics and Selenium
Someone asked in #selenium this morning about what they can do to prevent their test runs form affecting the site’s analytics.
The simplest thing to do is filter out the IP(s) of your test machines from the statistics your tracker collects. We’ve done this with Google for our office gateway so none of our internal traffic can skew our stats. (That is just a good idea anyways.). The person countered that it was for a client and so that solution was not directly available. Hold on a second. If they are paying you to run scripts against their site, and the site has analytics turned on, then they should be willing to filter out your machines.
This is a prime example of what is fundamentally a communication problem but attempting to solve it with a technical solution. Solve technical problems with technical solutions; solve communication problems with a phone.
Okay, rant done.
So let’s pretend that filtering was really not possible for whatever reason. How else could you solve it?
If you had access to the application code you could set some sort of configuration setting which will determine at render whether or not the analytics should be included. In Rails, it is particularly easy. We have this in a partial
<pre lang="ruby">
<script type="text/javascript">
var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");
document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));
</script>
<script type="text/javascript">
var pageTracker = _gat._getTracker("<%= omc_google_analytics_account %>");
pageTracker._setDomainName(".zerofootprint.net")
pageTracker._initData();
pageTracker._trackPageview();
</script>
This way, if the server is running in any other environment other than production the analytics don’t get included.
That solution is probably also not available to you if the communication is so bad that you can’t get things filtered. No worries, there is also a solution which is completely under your control and involved hacking DNS. Well, not exactly hacking but certainly manipulation.
What you can do is modify your hosts file to map google-analytics.com to 127.0.0.1 which will prevent your Selenium spawned browsers from requesting the script which means Google never even knows about you. (I used to do a similar trick with ad servers.) One thing you have to be careful of is that this will not return a 200 response code when it tries to fetch the script unless you put ga.js on the server. But if you put it on the server you are also going to want to make sure it ‘executes’ without error but that should be pretty easy as well.
Analytics are certainly a valuable part of a page’s payload. Heck, parts of your company would likely argue that they are the most important. But they also have to have to have accurate information and getting a traffic spike because you are running a test is not something you want. The above tricks will solve it for you.