Selenium, Facebook WebDriver, and Stale Elements.

This post aims to explain how and why stale element exceptions occur, and how I went about resolving issues surrounding it. If you're only interested in the solution, click here.

Recently I was writing and running Selenium tests for a Javascript-heavy app. Occasionally I found I'd encounter the StaleElementReferenceException totally spontaneously... There was no pattern or apparent reasoning behind the occurrences. An otherwise stable test would error out part way through, typically on a line where I'd call click() on an element.

I could watch the tests run and see the elements present, yet perhaps 1/3 of the tests would fail upon that exception. How can a visible and clickable element be in a detached state?

The Tools

Since I'm using Selenium, it doesn't really matter which tools you're using so long as Selenium is at the root of your testing architecture. However, here's a summary of what I'm working with so you can determine compatibility with your own setup:

  • The application's back-end rests on the Symfony framework (v 2.7.7), written in PHP (Compatible down to v 5.4)
  • The front-end is a legacy app utilizing MooTools and a custom JavaScript framework.
  • The test runner is PHPUnit 4.8.18
  • Selenium is wrapped by Facebook's PHP Webdriver

This setup is honestly pretty cumbersome. It works but it feels heavy; something I think comes with writing a PHP app. Also, Facebook has done a fine job, but documentation is sparse. Selenium's own documentation is okay as well, but edge cases are (reasonably) undocumented. Overall I find myself wishing there was a more elegant solution.

If you're using Java or Python, the same solutions transfer over just fine since the PHP Webdriver is a very thin wrapper. If you are using PHP with PHPUnit, this should transfer over almost directly.

The Reference in StaleElementReferenceException

The Selenium documentation on these exceptions says we have 2 likely problems:

  1. The element has been deleted entirely.
  2. The element is no longer attached to the DOM.

Possibility #2: It's Detached.

For whatever reason, I preferred to assume it was option 2. I started by asking myself, is the element in the DOM? Is it visible, and is the document allowing it to be clickable? Looking at the test proceed, it certainly looked like that was the case. You can also ensure these things (albeit non-atomically) by waiting until conditions are met:

<?php

// Is it there and clickable?
$webDriver->wait(10, 100)->until(
    WebDriverExpectedCondition::elementToBeClickable(
        WebDriverBy::xpath($elementXpath)
    )
);

// Sure, let's get the element:
$element = $webDriver->findElement(WebDriverBy::xpath($elementXpath));

// Now click it...
$element->click();

Unfortunately, even this can still generate the stale element exception. The DOM can change between waiting for a condition and clicking, even if it's a few milliseconds.

But if the element is still visible and appears to be enabled as the test is running, how can it be detached? I decided maybe the element was detached for only a fraction of a second and I probably needed to wait a moment to catch after it was attached. Knowing this was The Wrong Way to do it (It guarantees very little reliability), I gave it a shot anyways using implicit waiting before clicking:

<?php

// Wait 100ms...
$webDriver->manage()->timeouts()->implicitlyWait(100);

// Now it's probably there.
$element->click();

This didn't work at all. It just took an extra 100 milliseconds to fail. Wondering if it might take longer to re-attach, I tried the same thing in a loop:

<?php

use Facebook\WebDriver\Exception\StaleElementReferenceException;

$attempts = 0;
$exception = null;

while ($attempts < 5) {  
    $attempts ++;

    try {
        $element->click();
    } catch (StaleElementReferenceException $exception) {
        // :(
    }

    $webDriver->manage()->timeouts()->implicitlyWait(100)
}

if ($exception) {  
    throw $exception;
}

That didn't work either. Even setting the loop to try 50 times was futile (Not that it would have passed as a solution). I decided at this point that as per the documented possibility, "The element has been deleted entirely".

Possibility #1: It's Not Coming Back.

Most likely, JavaScript has torn down a DOM node and replaced it verbatim. This would cause no visible changes to a human, but Selenium's reference would be permanently removed.

This explains why the reference has gone 'stale' for Selenium. Waiting for it to come back isn't a solution, so we need something more robust in this condition.

Getting a Fresh Reference

The solution isn't too complicated once you know what's gone wrong. If you're experiencing the same issue, it boils down to this: We had an element locator, we got the element and stored a reference, but by the time we tried to use it it was gone. So, we just need to relocate it and get a new reference.

The Solution

You likely don't want to hardcode this into your tests every time you want to click something. What I've decided on is a method which tries to click an element and takes an optional xpath (personal preference) to the element. It also takes some optional parameters for the number of click attempts and wait times between attempts.

I store this in an AbstractBaseFeatureTest class which extends PHPUnit_Framework_TestCase, along with a helper for fetching a fresh reference. I then extend it with all integration test classes where I find a part of the DOM that tends to go stale.

<?php

use Facebook\WebDriver\Remote\RemoteWebElement;  
use Facebook\WebDriver\Exception\StaleElementReferenceException;

/**
 * @coversNothing
 */
abstract class AbstractBaseFeatureTest extends \PHPUnit_Framework_TestCase  
{
    /**
     * @var RemoteWebDriver
     */
    protected $webDriver;

    /**
     * Tries clicking an element. The repeated attempting pattern catches stale
     * element exceptions. This method isn't essential for every click, but it helps
     * a lot with cases where JavaScript manages the DOM after initial page load.
     * For clicks before Javascript will manipulate the DOM, this is probably overkill.
     *
     * @param RemoteWebElement $element      The element to try clicking.
     * @param string           $elementXpath Path to the element to try re-referencing it.
     * @param int              $numAttempts  How many times to try clicking.
     * @param int              $wait         How long to wait between clicks in milliseconds.
     *
     * @return AbstractBaseFeatureTest
     */
    protected function tryClicking(RemoteWebElement $element, $elementXpath = null, $numAttempts = 5, $wait = 10)
    {
        $attempts = 0;

        while ($attempts < $numAttempts) {
            $attempts ++;

            try {
                $element->click();

                return $this;
            } catch (StaleElementReferenceException $exception) {
                if ($elementXpath) {
                    $element = $this->getElementByXpath($elementXpath, "elementToBeClickable");
                }

                $this->webDriver->manage()->timeouts()->implicitlyWait($wait);
            }
        }

        // Looks like we made it here, so we should let the exception happen.
        throw $exception;

        return $this;
    }

    /**
     * Gets an element using xpath but waits until it's present by a given condition.
     *
     * @param string $xpath     The xpath for the element - Defaults to presenceOfElementLocated.
     * @param string $condition A string representation of a WebDriverExpectedCondition method.
     *
     * @return Facebook\WebDriver\WebDriverElement
     */
    protected function getElementByXpath($xpath, $condition = "presenceOfElementLocated")
    {
        $this->webDriver->wait(10, 100)->until(
            WebDriverExpectedCondition::$condition(
                WebDriverBy::xpath($xpath)
            )
        );

        return $this->webDriver->findElement(WebDriverBy::xpath($xpath));
    }
}

Now if we use this in a test, we might see some caught exceptions, but once the element reference is refreshed the test carries on as usual. Since this doesn't require any magic a user can't perform on their own, it's a safe way to recover from the inconvenience and stabilize a test.

The Problems

There are two drawbacks to this method. One is that you should assign two variables when getting an element you need to click:

<?php

$elementXPath = "//li[contains(@id, 'shd-')][1]/div[@class='controls']/a[@class='tog-fav']";

$element = $this->getElementByXpath(
    $elementXPath,
    "elementToBeClickable"
);

$this->tryClicking($element, $elementXPath);

The other is that this convention is a back-end solution dealing with what I consider a front-end problem. Why is the DOM rebuilding unexpectedly? Shouldn't I be able to predict the state of the page I'm trying to test? I think so. But we can't always wait for our front-end developer to fix that (or in my case, for me to fix it).

I won't fix it because I want to rebuild the existing front-end, and these tests are a spec to build against. You might not want to dig into a pile of markup and JavaScript, or you might not have access to it in the first place. This allows us to bypass that problem entirely.

And that's it: If Selenium has a stale reference, you just need to refresh it. I've only encountered this with clicking elements (Probably coincidence), but you could extend this convention to handle trying any kind of action.

How We Can Improve it

I'd like to know when things have errors, even if they can gracefully recover. For that reason, I've started including output in code like this:

<?php

protected function tryClicking(RemoteWebElement $element, $elementXpath = null, $numAttempts = 5, $wait = 10)  
{
    $attempts = 0;

    while ($attempts < $numAttempts) {
        $attempts ++;

        try {
            $element->click();

            return $this;
        } catch (StaleElementReferenceException $exception) {
            $remaining = $numAttempts - $attempts;

            $this->output(
                "Caught StaleElementReferenceException; waiting {$wait}ms then clicking again ({$remaining} tries remaining)...",
                "error"
            );

            if ($elementXpath) {
                $this->output(
                    "Re-referencing element due to stale element reference...",
                    "error"
                );

                $element = $this->getElementByXpath($elementXpath, "elementToBeClickable");
            }

            $this->webDriver->manage()->timeouts()->implicitlyWait($wait);
        }
    }

    // Looks like we made it here, so we should let the exception happen.
    throw $exception;

    return $this;
}

This outputs something like what's shown below. Keep in mind all of the output surrounding the reference exception is manual output from tests as well. PHPUnit doesn't produce this kind of output. I really like to explain tests (and their potential issues) to the test-runner or potential test-editor in the future.

steve$ phpunit -c app/ --group="searchHistory"

Running search from SearchHistoryTest::testDisablingSearchHistory  
  -> Searching for aquarium within 200mi of 90210 
  -> Cat/Sub Category: 8 / sss 
  -> Years:  ~  
  -> Asking:  ~  

Opening search history options...  
Caught StaleElementReferenceException; waiting 100ms then clicking again (4 tries remaining)...  
Re-referencing element due to stale element reference...  
Disabling search history...  
.

Previously, this would have looked like:

steve$ phpunit -c app/ --group="searchHistory"

Running search from SearchHistoryTest::testDisablingSearchHistory  
  -> Searching for aquarium within 200mi of 90210 
  -> Cat/Sub Category: 8 / sss 
  -> Years:  ~  
  -> Asking:  ~  

Opening search history options...  
E  

While that E will be explained by PHPUnit, it's a little frustrating for a good test to error out because of erratic DOM manipulations.

Now as the tests run, there are no silent or uncaught errors, and I can still know to act on these potential issues in the DOM. This offers more elegant output than echo and if it's to your liking, allows colouring based on the status of the output. If you're interested in seeing more of the abstract base and output methods, check it out here: AbstractBaseFeatureTest.php. Credit to the rest of the Tempest team and Brad Melanson for their work on this as well.

Thanks for reading! Hopefully this helps you as it would have helped me.


Selenium was so named because Huggins, dissatisfied with testing tools on the market, was seeking a name that would position the product as an alternative to Mercury Interactive QuickTest Professional commercial testing software. The name, Selenium, was selected because selenium mineral supplements serve as a cure for mercury poisoning, Huggins explained.

techworld


Steve Adams

I'm a web developer and designer at work, but I try to spend a lot of my time with my kids and girlfriend. I love weight lifting, running, cycling, climbing, electronics, and woodworking.

Victoria, BC