Taming Flaky Tests: How QAstra Builds 99% Stable Playwright Suites
Flaky tests don’t usually fail loudly. They fail quietly—once in a while, on a random commit, on
a machine you don’t control. And that’s what makes them dangerous.
One moment your pipeline is green. The next, it’s red for no obvious reason. The rerun passes. The team shrugs. Trust erodes a little more.
At QAstra, we’ve seen this pattern across teams of every size. And over time, we learned an uncomfortable truth: most flaky tests aren’t caused by “bad code” or “slow environments.”
They’re caused by design decisions made early—and never revisited.
This article breaks down:
One moment your pipeline is green. The next, it’s red for no obvious reason. The rerun passes. The team shrugs. Trust erodes a little more.
At QAstra, we’ve seen this pattern across teams of every size. And over time, we learned an uncomfortable truth: most flaky tests aren’t caused by “bad code” or “slow environments.”
They’re caused by design decisions made early—and never revisited.
This article breaks down:
- Where flakiness actually comes from
- Why Playwright helps—but doesn’t magically fix everything
- And how QAstra consistently builds Playwright suites that run with ~99% stability in CI No hype. No silver bullets. Just engineering.
First, Let’s Be Honest About Flakiness
Flakiness isn’t random. It just looks random.
When a test fails intermittently, there’s always a reason:
Flakiness isn’t random. It just looks random.
When a test fails intermittently, there’s always a reason:
- A race condition you didn’t model
- A UI state you assumed but never verified
- A backend dependency you didn’t control
- Or a selector that worked… until it didn’t
Most teams treat flakiness as a tooling issue:
“Selenium is flaky.”
“CI machines are slow.”
“The app is unstable.”
Those things can make flakiness visible, but they’re rarely the root cause.
The real problem is that many automation suites are built like scripts, not systems
The Real Root Causes of Flaky Tests
Before talking about solutions, it’s important to name the real enemies.
1. Timing Assumptions Disguised as Logic This is the classic one. A test clicks a button and immediately expects something to be there:
On CI, it fails once every 10 runs.
Why? Because the test assumes when something happens, not what condition makes it safe to proceed.
Flakiness starts the moment a test relies on time instead of state.
Before talking about solutions, it’s important to name the real enemies.
1. Timing Assumptions Disguised as Logic This is the classic one. A test clicks a button and immediately expects something to be there:
- A toast message
- A navigation
- A network-driven UI update
On CI, it fails once every 10 runs.
Why? Because the test assumes when something happens, not what condition makes it safe to proceed.
Flakiness starts the moment a test relies on time instead of state.
2. Selectors That Know Too Much
CSS and XPath selectors often encode too much knowledge:
The test didn’t fail because the feature broke.
It failed because the selector was never meant to survive change.
CSS and XPath selectors often encode too much knowledge:
- DOM depth
- Class names tied to styling
- Implementation details that change frequently
The test didn’t fail because the feature broke.
It failed because the selector was never meant to survive change.
3. Shared State Across Tests
Another silent killer.
Together, they create chaos—especially in parallel execution.
- One test logs in and leaves data behind
- Another test assumes a clean state
- A third test deletes
Together, they create chaos—especially in parallel execution.
4. Debugging Blindness in CI
Many flaky suites stay flaky because teams can’t see what went wrong.
A CI log says:
“Element not found”
That’s it. No DOM snapshot. No network context. No timing clues.
When engineers can’t diagnose failures confidently, flakiness becomes permanent background noise.
Many flaky suites stay flaky because teams can’t see what went wrong.
A CI log says:
“Element not found”
That’s it. No DOM snapshot. No network context. No timing clues.
When engineers can’t diagnose failures confidently, flakiness becomes permanent background noise.
Why Playwright Changes the Game (But Only If Used Properly)
Playwright doesn’t eliminate flakiness by magic.
It eliminates entire categories of mistakes—if you let it.
At QAstra, we treat Playwright as a stability framework, not just a browser driver.
Here’s how.
Playwright doesn’t eliminate flakiness by magic.
It eliminates entire categories of mistakes—if you let it.
At QAstra, we treat Playwright as a stability framework, not just a browser driver.
Here’s how.
How QAstra Builds 99% Stable Playwright Suites
1. We Eliminate Time From the Equation
The first rule: no sleeps, no blind waits, no “waitForTimeout” as logic. Instead, we rely on Playwright’s actionability checks:
If a test needs a spinner to disappear, we wait for the spinner to disappear—not 3 seconds and hope.
This single shift removes more flakiness than any AI tool ever will.
1. We Eliminate Time From the Equation
The first rule: no sleeps, no blind waits, no “waitForTimeout” as logic. Instead, we rely on Playwright’s actionability checks:
- Visibility
- Stability
- Enablement
- Attachment to the DOM
If a test needs a spinner to disappear, we wait for the spinner to disappear—not 3 seconds and hope.
This single shift removes more flakiness than any AI tool ever will.
2. We Design Locators for Change, Not for Today
QAstra standards strongly prefer:
Because users don’t click .btn-primary > span:nth-child(2).
They click “Submit”. When tests describe intent, not structure, they survive UI evolution.
We also enforce Playwright’s strict mode intentionally. If a locator matches more than one element, we want the test to fail early—before it does the wrong thing silently.
- Role-based locators
- Accessible names
- User-facing text where appropriate
Because users don’t click .btn-primary > span:nth-child(2).
They click “Submit”. When tests describe intent, not structure, they survive UI evolution.
We also enforce Playwright’s strict mode intentionally. If a locator matches more than one element, we want the test to fail early—before it does the wrong thing silently.
3. Isolation Is Non-Negotiable
Every test runs in its own browser context.
No shared cookies.
No leaked sessions.
No accidental dependencies.
This makes parallel execution predictable instead of scary.
Yes, it requires discipline.
Yes, it means thinking about setup and teardown properly.
But the payoff is enormous: tests stop influencing each other in subtle, unpredictable ways.
Every test runs in its own browser context.
No shared cookies.
No leaked sessions.
No accidental dependencies.
This makes parallel execution predictable instead of scary.
Yes, it requires discipline.
Yes, it means thinking about setup and teardown properly.
But the payoff is enormous: tests stop influencing each other in subtle, unpredictable ways.
4. We Treat Backend Dependencies as First-Class Citizens
Flaky UI tests are often blamed on the frontend, but the backend is frequently the real culprit.
At QAstra, we:
That’s not cheating.
That’s good test design.
Flaky UI tests are often blamed on the frontend, but the backend is frequently the real culprit.
At QAstra, we:
- Stub unstable third-party APIs
- Control test data intentionally
- Separate “UI verification” from “system integration” when appropriate
That’s not cheating.
That’s good test design.
5. Debuggability Is Built In, Not Added Later
Every QAstra Playwright suite ships with:
Instead of rerunning tests “to see if it passes,” teams fix root causes confidently—and flakiness steadily drops.
Every QAstra Playwright suite ships with:
- Trace recording on failure
- Screenshots and videos where they add value
- Clear logging around test intent
- See what the page looked like
- Inspect network requests
- Understand timing and state
Instead of rerunning tests “to see if it passes,” teams fix root causes confidently—and flakiness steadily drops.
6. We Use AI Carefully—and Sparingly
AI is powerful, but uncontrolled AI healing is dangerous.
At QAstra, we use AI to:
Stability comes from determinism first.
AI is an assistant—not the decision-maker.
AI is powerful, but uncontrolled AI healing is dangerous.
At QAstra, we use AI to:
- Analyze failure patterns
- Reduce noise in test reports
- Assist with selector migration in high-churn areas
Stability comes from determinism first.
AI is an assistant—not the decision-maker.
What 99% Stability Actually Means
99% stability doesn’t mean:
99% stability doesn’t mean:
- Tests never fail
- Bugs never exist
- Pipelines are always green
- When tests fail, they fail for real reasons
- Reruns are rare, not routine
- Engineers trust failures
Final Thoughts
Flaky tests aren’t an inevitability.
They’re a signal.
A signal that the test suite was built for speed first—and stability later.
At QAstra, we flip that order.
We design Playwright suites that can handle real-world CI conditions, parallel execution, and fast-moving codebases—without becoming noise.
If your automation feels fragile, it’s not because testing is hard.
It’s because stability was never treated as a feature. And features can always be engineered.
They’re a signal.
A signal that the test suite was built for speed first—and stability later.
At QAstra, we flip that order.
We design Playwright suites that can handle real-world CI conditions, parallel execution, and fast-moving codebases—without becoming noise.
If your automation feels fragile, it’s not because testing is hard.
It’s because stability was never treated as a feature. And features can always be engineered.
Ready to Make Your Tests Boring (In the Best Way)?
At QAstra Technologies, we specialize in building Playwright-first automation that teams actually trust—whether you’re starting fresh or stabilizing a flaky legacy suite.
If you’re tired of reruns, ignored failures, and “it passed on retry” excuses, we’d love to help.
Learn More
At QAstra Technologies, we specialize in building Playwright-first automation that teams actually trust—whether you’re starting fresh or stabilizing a flaky legacy suite.
If you’re tired of reruns, ignored failures, and “it passed on retry” excuses, we’d love to help.
Learn More
From Automation to Intelligent QA
Move beyond scripted automation with context-aware, adaptive testing that improves reliability, scalability, and release confidence