qastratechnologies

Taming Flaky Tests: How QAstra Builds 99% Stable Playwright Suites

Flaky tests don’t usually fail loudly. They fail quietly—once in a while, on a random commit, on a machine you don’t control. And that’s what makes them dangerous.

One moment your pipeline is green. The next, it’s red for no obvious reason. The rerun passes. The team shrugs. Trust erodes a little more.

At QAstra, we’ve seen this pattern across teams of every size. And over time, we learned an uncomfortable truth: most flaky tests aren’t caused by “bad code” or “slow environments.”

They’re caused by design decisions made early—and never revisited.

This article breaks down:
  • Where flakiness actually comes from
  • Why Playwright helps—but doesn’t magically fix everything
  • And how QAstra consistently builds Playwright suites that run with ~99% stability in CI No hype. No silver bullets. Just engineering.
First, Let’s Be Honest About Flakiness

Flakiness isn’t random. It just looks random.

When a test fails intermittently, there’s always a reason:
  • A race condition you didn’t model
  • A UI state you assumed but never verified
  • A backend dependency you didn’t control
  • Or a selector that worked… until it didn’t Most teams treat flakiness as a tooling issue:

    “Selenium is flaky.”
    “CI machines are slow.”
    “The app is unstable.”
    Those things can make flakiness visible, but they’re rarely the root cause.

    The real problem is that many automation suites are built like scripts, not systems
The Real Root Causes of Flaky Tests

Before talking about solutions, it’s important to name the real enemies.

1. Timing Assumptions Disguised as Logic This is the classic one. A test clicks a button and immediately expects something to be there:
  • A toast message
  • A navigation
  • A network-driven UI update
On a fast local machine, it works.

On CI, it fails once every 10 runs.

Why? Because the test assumes when something happens, not what condition makes it safe to proceed.

Flakiness starts the moment a test relies on time instead of state.
2. Selectors That Know Too Much

CSS and XPath selectors often encode too much knowledge:
  • DOM depth
  • Class names tied to styling
  • Implementation details that change frequently
They work—until a frontend refactor, a design tweak, or a minor library upgrade.

The test didn’t fail because the feature broke.

It failed because the selector was never meant to survive change.
3. Shared State Across Tests Another silent killer.
  • One test logs in and leaves data behind
  • Another test assumes a clean state
  • A third test deletes
something the first one needed Individually, the tests look fine.

Together, they create chaos—especially in parallel execution.
4. Debugging Blindness in CI

Many flaky suites stay flaky because teams can’t see what went wrong.

A CI log says:

“Element not found”

That’s it. No DOM snapshot. No network context. No timing clues.

When engineers can’t diagnose failures confidently, flakiness becomes permanent background noise.
Why Playwright Changes the Game (But Only If Used Properly)

Playwright doesn’t eliminate flakiness by magic.

It eliminates entire categories of mistakes—if you let it.

At QAstra, we treat Playwright as a stability framework, not just a browser driver.

Here’s how.
How QAstra Builds 99% Stable Playwright Suites

1. We Eliminate Time From the Equation

The first rule: no sleeps, no blind waits, no “waitForTimeout” as logic. Instead, we rely on Playwright’s actionability checks:
  • Visibility
  • Stability
  • Enablement
  • Attachment to the DOM
Every interaction waits for the right condition, not a guessed delay.

If a test needs a spinner to disappear, we wait for the spinner to disappear—not 3 seconds and hope.

This single shift removes more flakiness than any AI tool ever will.
2. We Design Locators for Change, Not for Today QAstra standards strongly prefer:

  • Role-based locators
  • Accessible names
  • User-facing text where appropriate
Why?

Because users don’t click .btn-primary > span:nth-child(2).

They click “Submit”. When tests describe intent, not structure, they survive UI evolution.

We also enforce Playwright’s strict mode intentionally. If a locator matches more than one element, we want the test to fail early—before it does the wrong thing silently.
3. Isolation Is Non-Negotiable

Every test runs in its own browser context.

No shared cookies.

No leaked sessions.

No accidental dependencies.

This makes parallel execution predictable instead of scary.

Yes, it requires discipline.

Yes, it means thinking about setup and teardown properly.

But the payoff is enormous: tests stop influencing each other in subtle, unpredictable ways.
4. We Treat Backend Dependencies as First-Class Citizens

Flaky UI tests are often blamed on the frontend, but the backend is frequently the real culprit.

At QAstra, we:
  • Stub unstable third-party APIs
  • Control test data intentionally
  • Separate “UI verification” from “system integration” when appropriate
If a test is meant to validate UI behavior, we don’t let a slow external service decide its fate.

That’s not cheating.

That’s good test design.
5. Debuggability Is Built In, Not Added Later

Every QAstra Playwright suite ships with:
  • Trace recording on failure
  • Screenshots and videos where they add value
  • Clear logging around test intent
When a test fails in CI, engineers can replay the failure step by step:
  • See what the page looked like
  • Inspect network requests
  • Understand timing and state
This changes team behavior.

Instead of rerunning tests “to see if it passes,” teams fix root causes confidently—and flakiness steadily drops.
6. We Use AI Carefully—and Sparingly

AI is powerful, but uncontrolled AI healing is dangerous.

At QAstra, we use AI to:
  • Analyze failure patterns
  • Reduce noise in test reports
  • Assist with selector migration in high-churn areas
We do not allow AI to silently change test behavior in CI without review.

Stability comes from determinism first.

AI is an assistant—not the decision-maker.
What 99% Stability Actually Means

99% stability doesn’t mean:
  • Tests never fail
  • Bugs never exist
  • Pipelines are always green
It means:
  • When tests fail, they fail for real reasons
  • Reruns are rare, not routine
  • Engineers trust failures
instead of ignoring them That trust is what unlocks real velocity.
Final Thoughts Flaky tests aren’t an inevitability.

They’re a signal.

A signal that the test suite was built for speed first—and stability later.

At QAstra, we flip that order.

We design Playwright suites that can handle real-world CI conditions, parallel execution, and fast-moving codebases—without becoming noise.

If your automation feels fragile, it’s not because testing is hard.

It’s because stability was never treated as a feature. And features can always be engineered.
Ready to Make Your Tests Boring (In the Best Way)?

At QAstra Technologies, we specialize in building Playwright-first automation that teams actually trust—whether you’re starting fresh or stabilizing a flaky legacy suite.

If you’re tired of reruns, ignored failures, and “it passed on retry” excuses, we’d love to help.

Learn More

From Automation to Intelligent QA

Move beyond scripted automation with context-aware, adaptive testing that improves reliability, scalability, and release confidence
Scroll to Top