Approach to flaky E2E tests at Clark

3 min readAug 18, 2021

Flaky tests — a topic chewed through so many times. Yet, by talking to my colleagues and peers, I find that all of us have slightly different approaches and smart ideas on how to gain a bit more efficiency in reducing the number of them. So here are some pillars of prevention, detection and resolution of flaky E2E tests at Clark. It’s a long journey, so in the end there is a section on how to keep the business going until you tackle the flakiness.

Prevention

When it comes to prevention, the first thing would definitely be to have as little of E2E tests as possible. They are expensive by nature, and we should stick to the test automation pyramid as much as possible to make informed decisions. But when we have to write them, we consider the following points:

no text-based interactions
collaboration with developers for specific locators
no test order dependencies, especially between 2 or more threads (we run them in about 30 threads at the same time on each release)
avoiding long scenarios
dynamic waiting wherever possible

Detection

We use Allure for test reporting, which provides a human-readable format for both technical and non-technical people. The crucial part comes on how do we utilize these results. In Clark Tech team, there is a weekly rotation in Release Team — a group of people that makes sure that releases go as smooth as possible. For every commit which marks a release, automated entry is made from Github into a Google Sheet that keeps the history of commits. Based on Github checks results, that entry is then automatically marked as green or red. Engineers from Release Team then categorize the red ones as e.g. “Broken tests” or “Flaky tests” (among others like “Unstable environment”, “Bug”…), and create tickets for fixes. This helps us build statistics over time and focus on most important groups of tests.

Resolution

Once we know which tests are flaky, we mark them immediately as High priority, and do all communication around them in a separate channel. This provides engineers with much needed space to talk about and work on these fixes. If you have difficulty finding capacity for them, try to collect data on time-to-market for each user story, and show your Product Lead why flaky test fixes should be High priority.

How to live with flaky tests?

Until we get to a point where we have very rare occurrences of flaky tests, we have to find a way to keep the release pipeline going. To avoid flakiness blocking releases, we configured our tests to retry up to 2 times each, in case they fail. We found this to be a sweet spot which allows us not to waste too much time on manual verification of failures.

Next to the run that we do before a release, we have a nightly run where we run all available automated tests. We put only 1 retry on the nightly run, so that we see more of flaky tests there, which we can pick up for fixing. Once the nightly run is stable with 1 retry, we can stop retrying, so that we start catching even more flakiness. Ideally, this should lead us into a state where we have little to none flaky tests.

Approach to flaky E2E tests at Clark

Prevention

Detection

Resolution

How to live with flaky tests?

Written by Zlatan Čilić