Making the rounds through twitter right now is Sikuli. I became aware of it through someone claiming it a Selenium killer. Well, not quite.

The ‘magic’ of Sikuli is that it used the actual content on the screen to find elements rather than a native API or XPath by taking screen captures and looking for image areas to interact with.

Yes, that’s correct, it used the way an object is rendered on the screen to determine whether it is interacted with or not. Didn’t industry decide this was a bad idea a number of years ago? Here is why this project fails the sniff test for me.

  • What about when I create a script on one resolution and play back on another?
  • What about different record and playback OSes?
  • Accessibility enhancements on/off?
  • Image reuse? Without this you have to change every location when something changes. And things always change.
  • How do you data-drive something that requires you to have image captures of the run? Data driving is rather important when it comes to minimizing code duplication yet increasing test data.
  • And since you have to have visited each screen in a certain manner whilst writing the script, you start to run afoul of both the Minefield Problem and Pesticide Paradox
  • And and if you can’t data-drive a script, is there any chance that scripts would be written in a model-based manner? Highly unlikely.
  • Can you record actions? I would wager that Selenium’s success can be largely attributed to the Record functionality of Se-IDE. A quick spin through their site implies that manual script creation is the only option. People who have done the market research have told me that most QTP users stay in the keyword, non-script, part of the environment.

Of course, the goal of Sikuli isn’t to solve these problems. It is a research project between two groups at MIT. This is said with some derision in my voice. Yes, some things are well suited to make the transition from Academia to Commercialization. Google came out of Stanford for instance, but it is/was at its core, a new algorithm to (re) attempt to solve an understood problem. The issue with this is that the divide between Academia and the Real World in Testing is mammoth. I’m not sure why this is exactly, but I often just shake my head when reading papers on testing by people ensconced in a University somewhere. Yes, there are some exceptions to this, but they are exceptions rather than the rule.

Now, contrary to how it may seem, I’m not completely bashing Sikuli, more just pouring some cold, hard reality against it. I understand its appeal; since it uses screen captures is can interact with anything that displays on the screen. That is close to a Holy Grail if ever there was one, but they way in which they accomplish it is terminally flawed I fear.