ATT2009 – Thanou Thirakul – Large Scale Testing in Agile Time

Thanou Thirakul did a nice experience report on a project he worked on for awhile which was having trouble continuing to use some of the Agile toolkit on. Specifically, their automated build and test infrastructure was failing them.

That was actually the illness, the symptoms, and more importantly how they addressed them were as follows:

Team lost faith in the build – The team, having lost faith in the build system would just ignore the results and even more damaging, would sometimes just skirt around it entirely. To address this they formed a specific team to restore faith in the build by tackling the other problems. One thing they did intentionally was to add people not currently in the project team to help figure out the solutions. This meant fresh eyes and ideas to a problem people might have tunnel vision towards.
No one wants to touche the build script – Being a 6 year old java project, their build was a series of cobbled together ant scripts that had grown organically. To fix this they maven-ized their build and got it running inside Hudson. In my experience, this is exactly what you want to be doing, though Ant + Ivy would likely work as well. Sometimes starting over really is the correct idea.
Too many false build failures – An analysis of their build failures showed they were largely environment related. To solve this they rebuilt their test machines as virtual machine snapshots. This meant they could quickly return things back to a known clean save removing the problem of environment degradation.
Integration tests taking too long – Since their environment was now virtual, they solved part of the length problem by just spawning more VMs to test with. That just moves the problem around though really. Now you need a runner which will distribute your tests. They solved that problem by building their own test distribution server. It was slick!

Here are some other notes I took:

Don’t contort your tools; use the right tool
Use the length of past test runs to estimate how long your next one will take.
When using VMs, for maximum performance assign each its own disk as I/O rapidly become the bottleneck.
More times than not, the first (correct) sultion is not to build yeat another tool

My major complaint about this talk was his commentary a couple times that ‘the best part about writing something cool like the test server is that you get to release it as open source’. When I asked where I could grab it I got something that sounded like a sales pitch. Even after talking to him afterward I’m not sure if I would have to part with money to doing something. Just release it! The community will find it and try it out in ways you can’t begin to think about.

Aside from that, it was a well done talk.