DLN.
TestDoctor

Test Fixer Agent

Your CI is red. Or worse, it's red sometimes. This agent systematically diagnoses and fixes both failing and flaky RSpec tests so you can trust your test suite again.

When Tests Become the Problem

Two flavors of test pain. Both cost you time and trust:

  • "The test was passing yesterday" — something changed, but what?
  • "Just re-run CI" — the universal prayer for flaky tests
  • "It works when I run it locally" — environment or timing differences strike again
  • "We quarantined it" — quarantine lists that grow forever
  • "The error message makes no sense" — cryptic stack traces hide the real cause

The result? Developers stop trusting the test suite. They merge despite failures. Real bugs slip through because "it's probably just flaky." Hours lost debugging. Features blocked waiting for green CI.

How the Agent Works

1

Test Suite Analysis

Maps the current state of your test suite and separates failures from flakiness.

  • Runs full test suite to identify all failures
  • Isolates each failing spec to rule out order dependencies
  • Runs suspect tests repeatedly to confirm flakiness vs deterministic failure
  • Categorizes failures: assertion errors, exceptions, timeouts, intermittent
2

Root Cause Diagnosis

Determines whether the test or the code is wrong, and identifies the pattern behind flaky failures.

  • Checks if code changed intentionally (test needs update)
  • Checks if it's a bug (code needs fix)
  • Detects async/timing issues (Turbo, Stimulus, Ajax)
  • Identifies test order dependencies, shared state, and time-dependent logic
3

Test Data Verification

Ensures factories and fixtures produce valid test data.

  • Checks factory definitions match current model requirements
  • Verifies let vs let! usage for proper setup timing
  • Validates database constraints and foreign keys
4

Fix Application

Applies the appropriate fix based on diagnosis.

  • Updates test expectations when code changed intentionally
  • Fixes bugs in application code when test is correct
  • Adds explicit waits: have_css("[data-loaded]", wait: 10)
  • Wraps jobs: perform_enqueued_jobs { example.run }
  • Freezes time: travel_to(Time.zone.local(2024, 1, 15))
5

Full Suite Validation

Ensures the fix doesn't break anything else.

  • Runs linters to check syntax and style
  • Runs full test suite to confirm no regressions
  • Runs previously flaky tests 50+ times to confirm stability
  • PR is only submitted when all tests pass reliably.

Optional: Coverage Expansion

Beyond fixing failing tests, we can identify gaps between production data and test data.

Production Error Analysis

Analyzes error logs to find untested edge cases that break in production.

Boundary Conditions

Generates property-based tests for edge cases your team didn't think of.

Negative Tests

Adds tests for nil values, empty strings, and invalid states.

Common Issues We Fix

Assertion Failures

Expected value changed due to intentional refactoring.

Fix: Update expectation to match new behavior

Async/Timing Issues

Tests that don't wait for Ajax, Turbo, or Stimulus to complete.

expect(page).to have_no_css("[aria-busy]")

NoMethodError / NilError

Method renamed, moved, or test setup doesn't create required associations.

Fix: Add missing let! or factory associations

Background Jobs

Tests that assert on job side-effects without running the job.

perform_enqueued_jobs { example.run }

Time Dependencies

Tests that pass or fail depending on time of day or day of week.

travel_to(Time.zone.local(2024, 1, 15))

Order Dependencies

Tests that rely on state from previous tests or shared global state.

let(:record) { create(:record) } # fresh each time

Safety Guarantees

Diagnosis Before Fix

Understands the root cause before applying any changes.

Full Suite Regression Check

Every fix is validated against the entire test suite.

No Sleep Hacks or Stubbing

Uses real data and explicit condition waits. No arbitrary sleep calls or mock workarounds.

Clear Explanations

Every PR explains what was wrong and why the fix works, so your team learns the pattern.

Linter Compliance

All changes pass RuboCop and any other configured linters.

Minimal Changes

Fixes only what's broken. No scope creep or unnecessary refactoring.

What This Is NOT

  • Not skipping or deleting tests. We fix them, not mark them pending or remove them.
  • Not adding sleeps or stubbing everything. Proper waits for conditions, real data that catches real bugs.
  • Not blind pattern matching. Understands why tests fail before fixing.
  • Not weakening assertions. Fixed tests stay meaningful and still catch the bugs they were designed for.

Ready to Get Back to Green?

Start with a $1,500 audit. Get a full test suite health report covering failing tests, flaky tests, and their root causes.