agent-qa

Learn how the memory curator turns run evidence into added, confirmed, deprecated, deleted, or unchanged observations.

The curator runs after test execution and decides whether the run produced useful behavioral memory. It is selective: memory should capture product behavior that helps future runs, not generic testing tactics or obvious page trivia.

A.U.D.N. decisions

The curator asks for A.U.D.N. decisions:

  • add: write a new behavioral observation.
  • update: confirm an existing observation that was relevant and correct.
  • deprecate: penalize an existing observation that the run contradicted.
  • noop: leave memory unchanged.

The implementation records update decisions as confirmation deltas in the memory log. noop decisions do not write files.

What gets added

New observations start with trust 0.5.

new observation trust = 0.5
confirmed_count = 0
contradicted_count = 0

The curator chooses a scope:

  • product scope for structural behavior that helps future tests across the product
  • suite scope for behavior tied to a suite sequence or position
  • test scope for behavior specific to one test

Suite observations include the suite position and suite snapshot so they can be matched safely later.

Confirmation and deprecation

When the curator confirms an observation, trust increases by trustConfirmDelta, last_confirmed is updated, and confirmed_count increases by one.

When the curator deprecates an observation, trust decreases by trustContradictDelta and contradicted_count increases by one.

If trust reaches zero, agent-qa deletes the observation file instead of keeping a zero-trust memory entry.

Failed runs

For failed runs, agent-qa does not ask the curator to add new observations. It looks at observations injected into the failed step and deprecates those observations because they may have contributed to the bad run.

If ablation later proves that memory caused a failure, the same deprecation path is used for all injected observations from that run.

Suite cleanup

Suite observations are tied to a suite_snapshot. After a suite run, the curator scans suite observations for that suite. If an observation's snapshot no longer matches the current suite entries, agent-qa deletes that stale suite observation.

This prevents a memory from one suite order from being reused after tests are inserted, removed, renamed, or reordered.

Curator lock

The local provider uses a .curator.lock file under the memory root. The lock serializes writes so concurrent runs do not update or delete the same observation at the same time.

curatorLockTimeout controls how long a run waits for the lock. A stale lock can be removed when the owning process is gone or the timestamp is old enough.

Security checks

Before a new or updated observation is written, agent-qa scans the title and body with the memory security scanner. Unsafe observation text is blocked and recorded as a curator error instead of being written to disk.