For years my stock answer for the question of how to download files from the browser has been “don’t do it, but if you really, really must then at least don’t try and use the browser” and I’ve left it at that. Well today I sat down to actually write the code to do it.

Here is the script which looks like. It just a ‘standard’ Py.Saunter script with the exection of the self._screenshot_prep_dirs() which is a bit of implmentation leakage that I’ll fix in the next release or two of Py.Saunter now that I’ve had an assumption I was making proven incorrect. Anyhow, it just goes to a random article as specified in a csv then downloads a pdf of that article. article_pdf is the full path on disc to it. For example, /Users/adam/work/client/client-automation/logs/2013-04-29-13-31-23/CheckPDFDownloads/test_pdf_download/downloaded_pdf.pdf.

<pre lang="python">from tailored.testcase import TestCase
import pytest

from pages.article import Article

class CheckPDFDownloads(TestCase):
    def setup_method(self, method):
        super(CheckPDFDownloads, self).setup_method(method)
        self._screenshot_prep_dirs()
        self.article = Article(self.driver).open_random_article().wait_until_loaded()

    def teardown_method(self, method):
        super(CheckPDFDownloads, self).teardown_method(method)

    @pytest.marks('shallow', 'pdf', 'article')
    def test_pdf_download(self):
        article_pdf = self.article.download("pdf")
        # open up this file in whatever pdf module you like and do whatever

The real interesting bit is, of course, in the Page Object. I’ll walk through the download method below rather than break it up inline.

<pre lang="python" line="1">from tailored.page import Page
from selenium.webdriver.support.wait import WebDriverWait
from providers.article import ArticleProvider
import random
from selenium.webdriver.common.action_chains import ActionChains
import requests
import os.path
import inspect
import sys

locators = {
    'article tab': 'css=span[name="article"]',
    'download button': 'xpath=//div[contains(@class,"btn-reveal")]/span[text()="Download"]',
    'pdf download button': 'css=.btn-reveal a[title$="PDF"]'
}

class Article(Page):
    def __init__(self, driver):
        super(type(self), self).__init__(driver)
        self.driver = driver

    def open(self, uri):
        self.driver.get("%s/%s" % (self.config.get('Selenium', 'base_url'), uri))
        return self

    def open_random_article(self):
        row = ArticleProvider().randomRow()
        return self.open(row["uri"])
    
    def wait_until_loaded(self):
        self.wait.until(lambda driver: driver.find_element_by_locator(locators["article tab"]))
        return self

    def download(self, type_of_download):
        chain = ActionChains(self.driver)
        chain.move_to_element(self.driver.find_element_by_locator(locators['download button']))
        chain.perform()

        def waiter(driver):
            e = self.driver.find_element_by_locator(locators["%s download button" % type_of_download])
            if e.is_displayed():
                return e
            return False
        button = self.wait.until(waiter)

        r = requests.get(button.get_attribute('href'))

        disposition = r.headers["content-disposition"]
        disposition_type, filename_param = disposition.split(';')
        filename = filename_param.split('=')[1][1:-1]

        stack = inspect.stack()

        # calling class name
        frame = inspect.currentframe(1)
        caller = frame.f_locals.get('self', None)
        calling_class_name = caller.__class__.__name__

        # calling method name
        calling_method_name = stack[1][3]
        path_to_file = os.path.join(self.config.get('Saunter', 'log_dir'), calling_class_name, calling_method_name, filename)

        f = open(path_to_file, "wb")
        f.write(r.content)
        f.close()

        sys.stdout.write(os.linesep + "[[ATTACHMENT|%s]]" % path_to_file)

        return path_to_file

Alright… lots of stuff going on in download, some of which is specific to this client but its useful to cover somewhere anyways.

  • 35 – 44: The actual download link on this page is hidden unless the user hovers over an different element. Clean UI, but an extra hoop to jump through when automating. We do this using an Action Chain and then synchronizing on whether the link is visible. One thing you’ll notice is that this method can download any number of different types. To keep it generic the locator string is generated at runtime. ```
    locators["%s download button" % type_of_download]
      ```
    
  • 46: If you were to look at this element in a browser the href attribute is actually a relative one. But WebDriver is smart enough to return a fully qualified one when it sees a relative href. Helpful! And since we have a full URL we can use Requests to grab it. Were it behind some sort of authentication scheme we could grab the correct cookies[s] from self.driver and put them in the requests.get(). Remember, HTTP is stateless.
  • 48 – 50: Because this is a nefarious example, the url we request the file from is actually just a call on the server and not the actual document. As such we need to figure out just what the heck the file we are downloading ‘should’ be called. Turns out the ‘standard’ way to do this is with the Content-Disposition header.
  • 52 – 60: Here is where things go sideways somewhat and then go completely into “no, you really shouldn’t be doing that!” territory. But it works! Well, as long as we apply the project-wide rule of ‘do not call this method from another page object’. What we are doing is peeking into the actual Python execution stack to get the script’s class and method names. This is easy-peasy if we were currently in the context of a script … but we’re not; we’re in a Page Object and they are script neutral.
  • 61: Since we now have all the information we need about the calling script, we can build a path (in an os-neutral way) to where we are going to put the file.
  • 63 – 65: Saves the contents of the file on disk in the appropriate place
  • 67: This client happens to be using Jenkins as their CI server and if you are using the JUnit-Attachment plugin for it, then this magically formatted line will add a link to our downloaded file to the displayed test results.
  • 69: Finally, we return the path on disk to the calling test method. From there it can be opened up in whatever PDF (in this case) parsing module to do further inspection. But that is out of scope for a method called download

For SaunterPHP users, the ideas you would follow would be very similar — just the reflection-y bits would be different.