The first script recipe I have written about before (here and here) and is a glass-box technique for determining, where you might want to test, how good your developers are at recording information , how complete features really are and for style adherence. What? You don’t have the code? I’ll wait while you go get it. No, I don’t care that you don’t know how to program. Now is as good a time to learn. Anyways, let’s look at the reasons for this script in a bit more detail.

  • test inspiration – Any time you see comments or variables or method with words like ‘hack’ or ‘broken’ or ‘kludge’, there is a better than none chance that you can find a bug in the surrounding code. Maybe by reading it, or by thinking about the limitations of the hack. At the very least the hack should be fixed to not be a hack, so log it for redesign / reimplementation (if it has not already been logged).
  • information capture – Modern IDEs let developers create little notes to themselves with a single click of a button. These notes typically take the form of comments starting with FIXME or TODO. In my world view, anytime there is a FIXME or a TODO left in the code there should be a corresponding item in the bug tracker. Why the duplicate capture of information? The bug tracker is the central means of keeping track of your task backlog and communicating it across the organization. Having information locked away where only the geeks can get it is asking for trouble and unpleasantness at the inevitable ‘project slip’ meeting. Also, developers (like testers) can get busy / bored / distracted and forget that they left a TODO over in some file over there last week for themselves.
  • completeness – Technically, if I am doing the dishes or laundry and I have something left ‘TODO’ then I am not done. Similarly, if a feature is marked as done but a TODO was including in the code commits then the feature is really not done. Or at the very least needs to have some questions asked about the completeness.
  • style – It is a good idea to write your code under the assumption that at some point someone is going to look at your code, and that someone is not anyone you can currently think of. For example, do you think the Netscape kids thought their code would be open sourced? Not a chance. If they had, they would not have spent months cleaning up the code for viewing. Had their test team been monitoring the code for socially unacceptable words then that process would have been a lot faster.

I have been using variations of this script since around 2005 and it seems that the market is starting to catch up with the idea. For example, Rails comes with a rake task (notes) which will find the TODOs, FIXMEs and OPTIMIZEs in the codebase. The problem with this is that it only finds those 3 values which means that we don’t get too too much value from it. And of course it only works on rails code and a lot of companies use a variety of languages depending on the project. (Or likely should.)

Here is the script (it is in python)

# this script is free to use, modify and distribute and comes with no warranties etc...
# - adam_goucher@hotmail.com

import os, os.path

def do_search(starting_path, res):
    # walk through our source looking for interesting files
    for root, dirs, files in os.walk(starting_path):
        for f in files:
            if is_interesting(f) == "yes":
                # since it is, we now want to process it
                process(os.path.join(root, f), res)
    print_results(res)

def is_interesting(f):
    # set which type of file we care about
    interesting = [".java", ".cpp", ".html", ".rb"]

    # check if the extension of our current file is interesting
    if os.path.splitext(f)[1] in interesting:
        return "yes"
    else:
        return "no"

def process(to_process, res):
    # make a list of things we are looking for (all lowercase)
    notes = ["todo", "to-do", "fixme", "fix-me"]
    # open our file in "read" mode
    r_f = open(to_process, "r")
    # make a counter
    line_num = 0
    # read our file one line at a time
    for line in r_f.readlines():
        # circle through each of the things we are looking for
        for note in notes:
            # check if our line contains a developer note
            # note we a lower()ing the line to avoid issues of upper vs lower case
            # note also the find() function; if the thing it is looking for is not
            #   found, it returns -1 else it returns the index
            if line.lower().find(note) != -1:
                # initialize our results to have a key of the file name we are on
                if not res.has_key(to_process):
                    # each value will be a list
                    res[to_process] = []
                # add our information
                res[to_process].append({"line_num": line_num, "line_value": line})
        # increment our counter
        line_num += 1
    r_f.close()

def print_results(res):
    # check if there was any developer notes found
    if len(res) > 0:
        # asking for a dictionary's keys gives you a list, so we can loop through it
        # rememeber, we used the file name as the key
        for f in res.keys():
            # the %s syntax says "put a string here", %d is the same for a number
            print "File %s has %s developer notes. They are:" % (f, len(res[f]))
            # our value for the key here is a list, so again, we can loop through it
            # (see, for loops are way too handy)
            for note in res[f]:
                # embed a tab for cleanliness
                print "\tLine %d: %s" % (note["line_num"], note["line_value"])
    else:
        print "No developer notes found."

# set our base for our source code
source_base = "path\to\your\code"

# create a dictionary which will hold our results
results = {}

# go!
do_search(source_base, results)

The places you modify on this script to make it your own are:

  • ‘interesting’ – put the extension for the files you are interested in finding things in. I have it checking Java, HTML, C++ and Ruby in this example
  • ‘notes’ – these are the strings you want to look for; just keep adding to the list as you discover new breadcrumbs left by the developers. One trick mentioned in the comments is to make these all lowercase. This is because we can skip the case-sensitivity by making the criteria lowercase and lowering the line we are checking.
  • ‘source_base’ – is the path to the top of the source tree being checked

And yes, this could be rewritten to use threads or something to improve its efficiency. But I have yet to be thwarted by the lack of performance of the script. So what if it takes 10 minutes to run, just as long as I get the information I am looking for.