Write tests in natural language

Define actions and assertions in human language while agents work from visible roles, labels, and screen state.

tests/linear/create-issue.yaml

test-id: t_slice-cart-bane-deep-fold-prim-paar-baru-nable-kayname: Check Linear issue creation flowtarget: linear-webuse:  browser:    name: chromiumsteps:  - Click on the Create issue icon.  - |    Verify that the Create issue modal    is shown.  - |    Enter "Fix mobile login" in the    "Issue title" input field.  - |    Select "Engineering" from the Team selector and select "Todo"    from the Status field.  - Click on the Create issue button.  - |    Verify that the created issue is shown with title "Fix mobile login"    and status "Todo".

Check Linear issue creation flow

Step 1 of 65.0s
Click on the Create issue icon.
#1click4.8s
Step 2 of 64.1s
Verify that the Create issue modal is shown.
#1assert4.0s
Step 3 of 65.3s
Enter "Fix mobile login" in the "Issue title" input field.
#1fill5.1s
Step 4 of 66.2s
Select "Engineering" from the Team selector and select "Todo" from the Status field.
#1select3.0s
#2select3.0s
Step 5 of 63.4s
Click on the Create issue button.
#1click3.2s
Step 6 of 65.8s
Verify that the created issue is shown with title "Fix mobile login" and status "Todo".
#1assert5.6s

Evolves with every run

With every test run, agent-qa builds execution memory from product, suite, and test observations, then adds that context to future runs. agent-qa also curates memory from steps that were healed during execution, helping future runs avoid the same mistake.

Learn about memory

Memory - Notion

obs_ria-gue-cake-long-elf-wag-time-quad-profit-alf

Workspace navigation contract

trust 0.91|confirmed 6 times

last confirmed today

Sidebar groups stay visible after switching between Docs, Projects, Calendar, and Settings. Future runs should verify the workspace switcher, command palette, and primary navigation labels before attempting deeper page assertions. This prevents the agent from rediscovering the navigation model on every run and keeps later assertions focused on the actual page behavior.

obs_mara-scope-desk-calm-page-search-index-round-quiet-latch

Command palette search context

trust 0.87|confirmed 5 times

last confirmed yesterday

The command palette returns workspace-scoped results first, then recent pages. Repeated tests should search for stable page titles and avoid assuming that the first result is the same across seeded workspaces. When the palette already contains recent pages, the agent should filter by exact title text before selecting the result.

obs_motif-page-toolbar-active-editor-share-comment-menu-state

Page toolbar persistence

trust 0.83|confirmed 4 times

last confirmed today

The page toolbar appears only after the editor area is active. Future runs should click into the page body before asserting Share, Comments, and More actions. This memory keeps the planner from treating a hidden toolbar as a failure when the page is simply idle.

Built for Humans

Top-tier developer experience with a beautiful dashboard, intuitive CLI, and clear workflows for authoring, running, and debugging tests.

Learn about the dashboard

agent-qa dashboard

Runs

AllRunningQueuedCompletedFailed

StatusTest NameTargetDuration

PassedCheck Linear issue creation flowlinear-web (Web)29s

PassedGitHub release fixture smokegithub-web (Web)41s

FailedSentry issue triage regressionsentry-web (Web)1m 12s

PassedSupabase project smoke testsupabase-web (Web)38s

agent-qa CLI

❯ agent-qa run tests/linear/create-issue.yamlRunning 1 test(s)...✓ Click on the Create issue icon. 5s  Sub-actions: 1 total (1 succeeded, 0 failed)✓ Verify that the Create issue modal is shown. 4s  Sub-actions: 1 total (1 succeeded, 0 failed)✓ Enter "Fix mobile login" in the "Issue title" input field. 5s  Sub-actions: 1 total (1 succeeded, 0 failed)✓ Select Engineering from Team and Todo from Status 6s  Sub-actions: 2 total (2 succeeded, 0 failed)✓ Click on the Create issue button. 3s  Sub-actions: 1 total (1 succeeded, 0 failed)✓ Verify created issue title and Todo status 6s  Sub-actions: 2 total (2 succeeded, 0 failed) PASS  Check Linear issue creation flow 29sRun ID: r_lined-frig-schema-main-depart-hing-aline-balls-cran-dess  Memory: 1 added (3s)Run attributes:  agent-qa.trigger=cli  agent-qa.runner=localTests:  1 of 1 passedSteps:  6 passed, 6 totalCache:  6 hits, 0 missesTime:   29s

Built for Machines

The same primitives are exposed through MCP and skills so coding agents can discover schemas, author YAML, enqueue runs, inspect artifacts, and triage failures.

Learn about MCP

CLI

MCP

SKILLS

Accelerate runs with smart Cache

The action cache reuses validated plans across similar subsequent test runs, reducing planner work, token usage, and runtime overhead.

Learn about caching

Execution Speed

42s -> 8s

Cached action plans skip redundant planner work on similar subsequent runs.

Reduced Token Usage

fewer planner tokens

Validated steps reuse prior reasoning when the flow and screen state still match.

Run sandboxed hooks during tests

Run Node, Bun, Python, or Bash hooks in isolated Docker containers to set up environments, call APIs, seed fixtures, tear down state, or pass structured outputs back into the active test run.

Learn about hooks

hooks - prepare-checkout.ts

// emits CHECKOUT_EUR_TOTAL_CENTS for the active test runconst response = await fetch("https://api.frankfurter.app/latest?from=USD&to=EUR,GBP")const { rates } = await response.json() const fixture = {  plan: "team",  currency: "USD",  subtotal_cents: 2900,  eur_total_cents: Math.round(2900 * rates.EUR),  gbp_total_cents: Math.round(2900 * rates.GBP),  seat_limit: 12,  fixture_at: "2026-05-07T00:00:00Z",} const env = Object.entries(fixture)  .map(([key, value]) => `CHECKOUT_${key.toUpperCase()}=${value}`)  .join("\n") await Bun.write("/tmp/agent-qa.env", `${env}\n`)console.log(JSON.stringify({ checkoutFixture: fixture }, null, 2))

Review your QA like code

Tests, configs, hooks, memory, and suite logic all live as version-controlled code, so every change can be diffed, reviewed, reused, and shared across teams.

Learn about configuration

review - tests/supabase/project-smoke.yaml

diff --git a/tests/supabase/project-smoke.yaml b/tests/supabase/project-smoke.yaml
index 4a31d1f..6af40cd 100644
--- a/tests/supabase/project-smoke.yaml
+++ b/tests/supabase/project-smoke.yaml
@@ -4,8 +4,9 @@
 test-id: t_lumen-rail-civic-model-pager-slate-harbor-fable-drift
 name: Supabase project smoke test
 target: supabase-staging
 steps:
   - Open the Supabase dashboard
-  - Verify the project status reads "Healthy"
-  - Open API settings and verify the project URL is displayed
+  - Open Project Settings > API
+  - Verify the Project URL matches $SUPABASE_PROJECT_URL
+  - Verify the anon key remains masked before copy

Self-healing test execution

When any sub-action, such as click, fill, or select, fails, agent-qa re-observes the UI and tries a different path in the same run. Tests recover from UI drift and flaky interactions instead of failing on the first broken action.

Learn about self-healing

healed run - tests/table/create-row.yaml

Step 11 of 20

Add column "story_name" with type text in the create-table form.

#1click3.9s

#2fill2.8s

#3click3.9s

#4tapCoordinate8.8s

#5click4.6s

#6select6s

#7click4.4s

#8click11.7s

#9keypress13.4s

Bring your own LLM

Run tests with the model of your choice via OpenAI- and Anthropic-compatible endpoints, Gemini, local or open-source models, and subscriptions like Codex and Claude Code.

Learn about LLM providers

> agent-qa