One problem with large scale automation is that legitimate failures will start to linger about. Whether this is a symptom of the organization not actually caring about the results is a different problem entirely, but one that can be solved at the script level is to make sure that known failures don’t hide new issues.

Each runner is going to have a slightly different way to do this[1] and call it something different but let’s just go with Py.Test’s terminology and implementation. Py.Test calls this XFail.

<pre lang="python">import pytest
import sys

class TestXFail(object):
  @pytest.mark.xfail
  def test_monkey(self):
    pytest.fail('okay')

  @pytest.mark.xfail(reason="squirrels are a fail")
  def test_squirrel(self):
    pass #except this one apparently
  
  @pytest.mark.xfail('sys.platform == "win32"')
  def test_gorilla(self):
    raise Exception

A couple things. First, if your team uses this sort of pattern I would highly suggest it become convention to not use it blindly without something like the reason argument. Now, if you have better names test methods than I do here that might not be necessary, but it seems like a good idea regardless. Also, you can use a condition to help it decide whether the xfail is going to applied or not. This feels like the wrong use for it, at least in this example. Using a skipif decorator seems like a better fit. But again, your milage will vary.

In any event, when run this is the output.

<pre lang="bash">Adam-Gouchers-MacBook:xfail adam$ py.test bar.py 
============================= test session starts ==============================
platform darwin -- Python 2.7.2 -- pytest-2.2.0
collected 3 items 

bar.py xxX

===================== 2 xfailed, 1 xpassed in 0.19 seconds =====================
Adam-Gouchers-MacBook:xfail adam$

That’s it. No stack traces to clutter things up, just two x and a X. Though the X is something to pay attention too. This is the indicator that something we though should fail has passed. This is new information which is the whole point of doing automation in the first place. Did something get fixed? Intentionally? Accidentally? Is our script no longer valid as a result of some other changes? That’s up to you do determine.

And then of course there is the pesky problem of how long things stay in an xfail-ed state. But that is an organizational and cultural problem.

[1] And if not it is not too hard to parse out the log and gather that information and compare it to a known failures list somewhere