agent-qa

Understand agent-qa memory as file-backed behavioral observations that make future runs more product-aware without replacing live evidence.

Memory is the flagship differentiator in agent-qa. It lets the agent learn product behavior from previous runs, then bring that context back into future steps as reviewable evidence instead of hidden model state.

agent-qa memory is file-backed. Observations are markdown files under a memory root, defaulting to agent-qa-memory, and are organized by products, suites, and tests.

The lifecycle

Memory works step by step:

  1. A run starts with a product, test ID, and sometimes a suite ID plus suite position.
  2. agent-qa builds an in-memory index from matching product, suite, and test observation files.
  3. Before a step runs, the current step text is used to query the index.
  4. Matching observations are injected as <memory-context> for that step.
  5. The agent observes the live app and executes the step.
  6. After the run, the curator reviews the result and decides whether to add, update, deprecate, or do nothing.

The important boundary is that memory is contextual evidence, not a command channel. It helps the agent remember product behavior, but the current page, app state, logs, and test instructions still decide the run.

File-backed tiers

Memory lives in three tiers:

agent-qa-memory/
  products/
    issue-tracker/
  suites/
    s_hill-gant-verb-nast-hunter-rita-home-store-amy-crest/
  tests/
    t_quad-adar-micro-magic-cross-cue-open-agog-rang-cours/

Product memories apply broadly to a product target. Suite memories apply to a suite and, when position data is present, to a specific child test position. Test memories apply to one test ID.

This file layout keeps memory inspectable. You can review diffs, remove stale observations, and understand why the agent saw a memory entry.

What the curator does

The curator is the post-run process that turns run evidence into memory changes. It can:

  • add a new observation when the run reveals useful product behavior
  • update an existing observation when new evidence confirms or refines it
  • deprecate an observation when the run contradicts it
  • do nothing when the run did not produce a useful memory change

The curator writes markdown files through the memory provider and uses trust scores to keep unreliable observations from dominating future steps.

Memory, cache, and config

Memory is not the action cache. Cache reuses execution-level action results when that is safe. Memory stores behavioral observations about the product, suite, or test.

Memory is also not static configuration. Configuration says how to run. Memory says what previous runs observed.

Use configuration for known facts such as targets, browsers, mobile devices, LLMs, hooks, and timeouts. Use memory for facts that evolve from test execution, such as a product label, a common workflow outcome, or a suite-specific dependency between child tests.

Where to go next