I kinda think that developers in the nebulous past are trying to troll me with code that is later expected to be automated. (I know this is not really the case, and that automatable-code techniques are only starting to become mainstream-ish but really folks…) A recent client does all sorts of crazy price modelling and displays that information in a table — which makes sense since it is tabular data.

Unfortunately…

  • There are no [visible] column headings since they are in a different table which allows for scrolling of large tables whilst having the headings remain on the screen
  • There are no id’s or class attributes that one can glom onto
  • The dimensions of the table changes based on a number of factors
  • The data cells are actually tables themselves!
  • The content of the cells changes based on a number of factors

If fact, it looks suspiciously like this.

  A B BAD C
Prod Group 117 <div style="">| <div class="oldReference" style="float: left; height: 20px;">Monkey </div> | <div class="oldReference" style="float: left; height: 20px;">787.159 </div><div class="advanced" style="float: right; display: none;"></input> </div> | | Cats: | </input> </input> 749.675 | | <div class="CapLabel" style="display: none;">Cap:</div> | <div class="CapCell" style="display: none;"> Min Chg %: </input> No Decrease Max Chg %: </input> </div> | | Items: | 1 | | Revenue: | $14,169 9.7% | | Impact: | $0 0.0% | | Margin: | 19.4% 19.4% | | MD %: | (-1.8%) (-20.5%) | | MI %: | 10.9% (-20.5%) | </div> <div style="">| <div class="oldReference" style="float: left; height: 20px;">Monkey </div> | <div class="oldReference" style="float: left; height: 20px;">637.744 </div><div class="advanced" style="float: right; display: none;"></input> </div> | |—|—|—| | Cats: | </input> </input> 607.375 | | <div class="CapLabel" style="display: none;">Cap:</div> | <div class="CapCell" style="display: none;"> Min Chg %: </input> No Decrease Max Chg %: </input> </div> | | Items: | 8 | | Revenue: | $52,295 35.7% | | Impact: | ($0) (0.0%) | | Margin: | 20.1% 20.1% | | MD %: | (-52.0%) (-54.3%) | | MI %: | 56.6% 14.8% | </div> <div style="">| <div class="oldReference" style="float: left; height: 20px;">Monkey </div> | <div class="oldReference" style="float: left; height: 20px;">799.219 </div><div class="advanced" style="float: right; display: none;"></input> </div> | |—|—|—| | Cats: | </input> </input> 761.161 | | <div class="CapLabel" style="display: none;">Cap:</div> | <div class="CapCell" style="display: none;"> Min Chg %: </input> No Decrease Max Chg %: </input> </div> | | Items: | 2 | | Revenue: | $13,587 9.3% | | Impact: | $0 0.0% | | Margin: | 21.8% 21.8% | | MD %: | (-0.3%) (-19.3%) | | MI %: | 0.9% (-19.3%) | </div> <div style="">| <div class="oldReference" style="float: left; height: 20px;">Monkey </div> | <div class="oldReference" style="float: left; height: 20px;">661.240 </div><div class="advanced" style="float: right; display: none;"></input> </div> | |—|—|—| | Cats: | </input> </input> 629.752 | | <div class="CapLabel" style="display: none;">Cap:</div> | <div class="CapCell" style="display: none;"> Min Chg %: </input> No Decrease Max Chg %: </input> </div> | | Items: | 26 | | Revenue: | $27,772 18.9% | | Impact: | ($0) (0.0%) | | Margin: | 22.7% 22.7% | | MD %: | (-41.4%) (-52.6%) | | MI %: | 60.2% 29.7% | </div>
Prod Group 142 | <div class="oldReference" style="float: left; height: 20px;">Monkey </div> | <div class="oldReference" style="float: left; height: 20px;">0 </div><div class="advanced" style="float: right; display: none;"></input> </div> | | Cats: | </input> </input> 0 | | <div class="CapLabel" style="display: none;">Cap:</div> | <div class="CapCell" style="display: none;"> Min Chg %: </input> No Decrease Max Chg %: </input> </div> | | Items: | 0 | | Revenue: | $0 0.0% | | Impact: | $0 0.0% | | Margin: | 0.0% 0.0% | | MD %: | 0.0% 0.0% | | MI %: | 0.0% 0.0% | <div style="">| <div class="oldReference" style="float: left; height: 20px;">Monkey </div> | <div class="oldReference" style="float: left; height: 20px;">374.310 </div><div class="advanced" style="float: right; display: none;"></input> </div> | |—|—|—| | Cats: | </input> </input> 356.486 | | <div class="CapLabel" style="display: none;">Cap:</div> | <div class="CapCell" style="display: none;"> Min Chg %: </input> No Decrease Max Chg %: </input> </div> | | Items: | 7 | | Revenue: | $26,202 17.9% | | Impact: | ($0) (0.0%) | | Margin: | 19.7% 19.7% | | MD %: | (-22.6%) (-38.7%) | | MI %: | 11.2% (-20.8%) | </div> <div style="display:none;">| <div class="oldReference" style="float: left; height: 20px;">Monkey </div> | <div class="oldReference" style="float: left; height: 20px;">0 </div><div class="advanced" style="float: right; display: none;"></input> </div> | |—|—|—| | Cats: | </input> </input> 0 | | <div class="CapLabel" style="display: none;">Cap:</div> | <div class="CapCell" style="display: none;"> Min Chg %: </input> No Decrease Max Chg %: </input> </div> | | Items: | 0 | | Revenue: | $0 0.0% | | Impact: | $0 0.0% | | Margin: | 0.0% 0.0% | | MD %: | 0.0% 0.0% | | MI %: | 0.0% 0.0% | </div> <div style="">| <div class="oldReference" style="float: left; height: 20px;">Monkey </div> | <div class="oldReference" style="float: left; height: 20px;">370.155 </div><div class="advanced" style="float: right; display: none;"></input> </div> | |—|—|—| | Cats: | </input> </input> 352.529 | | <div class="CapLabel" style="display: none;">Cap:</div> | <div class="CapCell" style="display: none;"> Min Chg %: </input> No Decrease Max Chg %: </input> </div> | | Items: | 14 | | Revenue: | $8,884 6.1% | | Impact: | ($0) (0.0%) | | Margin: | 20.9% 20.9% | | MD %: | (-23.5%) (-34.6%) | | MI %: | 1543.7% 1261.6% | </div>
Prod Group 144 <div style="display:none;">| <div class="oldReference" style="float: left; height: 20px;">Monkey </div> | <div class="oldReference" style="float: left; height: 20px;">0 </div><div class="advanced" style="float: right; display: none;"></input> </div> | | Cats: | </input> </input> 0 | | <div class="CapLabel" style="display: none;">Cap:</div> | <div class="CapCell" style="display: none;"> Min Chg %: </input> No Decrease Max Chg %: </input> </div> | | Items: | 0 | | Revenue: | $0 0.0% | | Impact: | $0 0.0% | | Margin: | 0.0% 0.0% | | MD %: | 0.0% 0.0% | | MI %: | 0.0% 0.0% | </div> <div style="display:none;">| <div class="oldReference" style="float: left; height: 20px;">Monkey </div> | <div class="oldReference" style="float: left; height: 20px;">0 </div><div class="advanced" style="float: right; display: none;"></input> </div> | |—|—|—| | Cats: | </input> </input> 0 | | <div class="CapLabel" style="display: none;">Cap:</div> | <div class="CapCell" style="display: none;"> Min Chg %: </input> No Decrease Max Chg %: </input> </div> | | Items: | 0 | | Revenue: | $0 0.0% | | Impact: | $0 0.0% | | Margin: | 0.0% 0.0% | | MD %: | 0.0% 0.0% | | MI %: | 0.0% 0.0% | </div> <div style="display:none;">| <div class="oldReference" style="float: left; height: 20px;">Monkey </div> | <div class="oldReference" style="float: left; height: 20px;">0 </div><div class="advanced" style="float: right; display: none;"></input> </div> | |—|—|—| | Cats: | </input> </input> 0 | | <div class="CapLabel" style="display: none;">Cap:</div> | <div class="CapCell" style="display: none;"> Min Chg %: </input> No Decrease Max Chg %: </input> </div> | | Items: | 0 | | Revenue: | $0 0.0% | | Impact: | $0 0.0% | | Margin: | 0.0% 0.0% | | MD %: | 0.0% 0.0% | | MI %: | 0.0% 0.0% | </div> <div style="">| <div class="oldReference" style="float: left; height: 20px;">Monkey </div> | <div class="oldReference" style="float: left; height: 20px;">534.086 </div><div class="advanced" style="float: right; display: none;"></input> </div> | |—|—|—| | Cats: | </input> </input> 508.653 | | <div class="CapLabel" style="display: none;">Cap:</div> | <div class="CapCell" style="display: none;"> Min Chg %: </input> No Decrease Max Chg %: </input> </div> | | Items: | 1 | | Revenue: | $3,739 2.5% | | Impact: | $0 0.0% | | Margin: | 20.5% 20.5% | | MD %: | (-4.6%) (-18.2%) | | MI %: | 3.7% (-18.2%) | </div>

Long story made short, we don’t care what the actual numbers are, just that they are consistent throughout the application. Which of course means we need to be able to get a specific number, from a specific cell. Which of course would be fairly easy-peasy if there were meaningful id’s to hang off of…

Here is the non-annotated version of my get_cell_at(x, y) function.

<pre lang="python">def get_cell_at(x, y):
    # get the column elements
    column_elements = driver.find_elements_by_css_selector('th')
    # get the text of them, and do some html cleanup
    column_names = [driver.execute_script('return arguments[0].textContent', e).strip() for e in column_elements]
    # find which column is the one we want
    which_column = column_names.index(x)

    # get all the rows in the body (we'll use them a couple times)
    rows = driver.find_elements_by_xpath('//table[not(@cellspacing)]/tbody/tr')

    row_names = []
    for row in rows:
        name = row.find_element_by_xpath('./td[starts-with(@id, "RevenueLabel")]')
        row_names.append(name.text)
    which_row = row_names.index(y)

    cell = row.find_element_by_xpath('//table[not(@cellspacing)]/tbody/tr[%d]/td[not(@style)][%d]' % ((which_row + 1), (which_column + 2)))

    # cell contents (there are different 'types' of cell contents)
    data = {}
    things = cell.find_elements_by_xpath('./div/table/tbody/tr')
    for thing in things:
        first_inner_thing = thing.find_element_by_css_selector('td')
        if first_inner_thing.text == "Monkey":
            data["Monkey"] = thing.find_element_by_xpath('./td[2]').text
    
    return data

And now for the annotations.

The first thing we need to do is get the names of the columns. Thankfully they are in a <thead> section in the main table. As mentioned above, the user visible strings are outside of this table to allow for scrolling of the table while keeping the column identifiers visible.

Also, recall how WebDriver doesn’t let you get the text of a non-visible element? Ya … JS Executor to the rescue.

Oh. And just for extra fun, some column headings have embedded <br /> rather than using word wrapping capabilities of CSS.

<pre lang="python">def get_cell_at(x, y):
    # get the column elements
    column_elements = driver.find_elements_by_css_selector('th')
    # get the text of them, and do some html cleanup
    column_names = [driver.execute_script('return arguments[0].textContent', e).strip() for e in column_elements]
    # find which column is the one we want
    which_column = column_names.index(x)

Now we fetch all the rows in the table body.

<pre lang="python">    # get all the rows in the body (we'll use them a couple times)
    rows = driver.find_elements_by_xpath('//table[not(@cellspacing)]/tbody/tr')

Finding the row want is very similar to finding the column, but because it is visible we don’t need the JS Executor. The hitch here is that when you are doing an xpath based search using a previously found element you need to start it with “./” rather than “//”. To me this is a bit counter intuitive since I think holding the element should reset where ‘the top’ of the document is for the search, but I can also see how this way is correct.

<pre lang="python">    row_names = []
    for row in rows:
        name = row.find_element_by_xpath('./td[starts-with(@id, "RevenueLabel")]')
        row_names.append(name.text)
    which_row = row_names.index(y)
This would be so much easier (and less brittle) if there was id’s, but it does get the specific cell we wanted. The which_[column row] munging happens due to the difference between python’s counting at 0 and xpath’s at 1. Hurray for consistency!
<pre lang="python">    cell = row.find_element_by_xpath('//table[not(@cellspacing)]/tbody/tr[%d]/td[not(@style)][%d]' % ((which_row + 1), (which_column + 2)))

The last thing to deal with is the actual contents of the cell, which as mentioned is itself a table. In an ideal world there would be hints in the elements itself through something like a data-* attribute which point we could just loop over things and build a dictionary automagically. But there isn’t so we’ll do it by hand. (I’m only doing the first one for this example…)

<pre lang="python">    # cell contents (there are different 'types' of cell contents)
    data = {}
    things = cell.find_elements_by_xpath('./div/table/tbody/tr')
    for thing in things:
        first_inner_thing = thing.find_element_by_css_selector('td')
        if first_inner_thing.text == "Monkey":
            data["Monkey"] = thing.find_element_by_xpath('./td[2]').text
        # check the other characteristics of the data...
        
    return data

With that in place, you can refer to individual bits like so.

<pre lang="python">driver = Remote(desired_capabilities=DesiredCapabilities().FIREFOX)
try:
    driver.get('file:///Users/adam/tmp/tables/cells.html')
    cell = get_cell_at("C", "Prod Group 142")
    assert(cell["Monkey"] == "370.155")
finally:
    driver.quit()

Phew! The moral of this story? Be nice to your future automators and put things that identify columns and rows and cell data. Oh, and that even a solution as peppered with brittle locators but does the job is still better than one that doesn’t.