Pagination for data sets you don’t control

One of my clients uses the YouTube search API (amongst others) to do media searches. One of their suites could be described as a ‘pagination’ one — and has a surprising number of rules about what gets displayed when. The original approach that was take was to search for a random term[1] and then check that the results fell into the category the script required. When you don’t care what the response looks like other than ‘was there a response’ then random search criteria is absolutely an excellent way to deal with the problem. But when you care about what the structure of the response looks like then it becomes less than ideal for a number of reasons. The main one being that those retries take time and front-end automation takes a long time in the first place.

Let’s look at what our data sets need to look like

0 results
1 page of results
2 – 8 pages of of results
9 – 13 pages of results
more than 14 pages of results

To check that we have the proper looking data for the script we need to do some ‘data curation’. This is similar to using hard-coded data, but I think it differs in that it is useful now, but everyone realizes that is it is likely to going to be longer useful at some point in the future. Hard-coded data tends to be expected to be valid for eternity.

<pre lang="python">@pytest.marks('deep', 'videosearch')
def test_verify_pagination_for_search_pages_less_than_9(self):
    """This needs to have a search result that is somewhere in the 26 to 224 results range."""
    search_results_page = self.video_dashboard_page.perform_search("bunny ears")
    search_results_count = search_results_page.total_search_results_count
    if (search_results_count > (9 * EXPECTED_RESULTS_PER_PAGE)):
        pytest.fail("Search result needs to be adjusted. Got %d" % search_results_count)
    actual = search_results_page.get_actual_number_of_pages_displayed_in_pagination()
    calculated = search_results_page.get_calculated_number_of_pages_to_be_displayed_in_pagination()
    self.assertEqual(actual, calculated)

The important parts here are the docstring (triple quotes right after the method definition) which describes what the data set should look like (for when we need up update it) and the pytest.fail(). The original authors were using pytest.skip() when they couldn’t get the data to look they way they wanted but that doesn’t really capture what is happening at that point; the script isn’t being skipped, it cannot be run. Notice too that the fail also has the direct action needed so one can see from scanning the log why it failed.

One further modification I might make to this script is to move the actual search string into a provider of some sort and have it read its content from a csv file thus allowing for a list of terms that meet each result criteria. There were bigger fish to fry in the codebase though to worry about that right now.

[1] Well, random-ish as there is a CSV file with a couple thousand terms in it that it script was pulling from