DLN.
TestStabilizer

Flaky Test Fixer Agent

Diagnoses and fixes tests that pass and fail intermittently without code changes. Stop playing CI lottery and get deterministic, reliable test results.

The CI Lottery Nobody Wants to Play

Your CI is green. Then red. Then green again. Same code, different results:

  • "Just re-run CI" — the universal prayer for flaky tests
  • "It works on my machine" — timing-dependent tests hide in CI
  • "We quarantined it" — quarantine lists that grow forever
  • "Don't touch that test" — fear of making it worse

The result? Developers stop trusting the test suite. They merge despite failures. Real bugs slip through because "it's probably just flaky."

How the Agent Works

1

Flaky Test Identification

Confirms which tests are actually flaky vs legitimately failing.

  • Analyzes CI history to find tests with inconsistent pass/fail rates
  • Runs suspect tests 50+ times locally to confirm flakiness
  • Tests in isolation and in different orders
  • Produces a flakiness report with failure rates by root cause
2

Root Cause Analysis

Identifies the pattern causing intermittent failures.

  • Checks for async/timing issues (Turbo, Stimulus, Ajax)
  • Detects test order dependencies and shared state
  • Identifies time-dependent logic and background job issues
3

Targeted Fix Application

Applies the appropriate fix based on the identified pattern.

  • Adds explicit waits: have_css("[data-loaded]", wait: 10)
  • Wraps jobs: perform_enqueued_jobs { example.run }
  • Freezes time: travel_to(Time.zone.local(2024, 1, 15))
4

Stability Verification

Proves the fix works by running the test repeatedly.

  • Runs the fixed test 100+ times to confirm stability
  • Tests in randomized order with other specs
  • Verifies no new flakiness was introduced
5

Pull Request with Diagnosis

Delivers a PR that explains the root cause and fix.

  • Creates atomic PRs: one fix per flaky test for easy review
  • Explains why the test was flaky (timing, state, etc.)
  • Shows before/after stability results
  • Your team learns the pattern for future prevention.

Common Flaky Patterns We Fix

Async/Timing Issues

Tests that don't wait for Ajax, Turbo, or Stimulus to complete.

expect(page).to have_no_css("[aria-busy]")

Background Jobs

Tests that assert on job side-effects without running the job.

perform_enqueued_jobs { example.run }

Time Dependencies

Tests that pass/fail depending on time of day or day of week.

travel_to(Time.zone.local(2024, 1, 15))

Order Dependencies

Tests that rely on state from previous tests.

let(:record) { create(:record) } # fresh each time

Safety Guarantees

Test Intent Preserved

Fixes make tests more reliable, not less meaningful. No weakening assertions.

No Sleep Hacks

Never uses arbitrary sleep calls. Waits for specific conditions.

Verified Stability

Every fix is validated with 10+ consecutive successful runs.

Root Cause Documentation

Every PR explains why the test was flaky so your team can avoid the pattern.

Suite Compatibility

Fixed tests run correctly in any order with --order random.

No Test Skipping

We fix tests, not skip them. Your coverage stays intact.

What This Is NOT

  • Not deleting tests. We fix them, not skip or remove them.
  • Not adding sleeps. Proper waits for conditions, not arbitrary delays.
  • Not weakening assertions. Tests stay meaningful and catch real bugs.
  • Not a one-time band-aid. Fixes address root causes, not symptoms.

Typical Results

95%
flaky tests eliminated
60%
reduction in CI time
0
"re-run CI" prayers needed
100%
trust in test results restored

Ready to Stop the CI Lottery?

Start with a $1,500 audit. Get a flakiness report showing your most problematic tests and their likely root causes.