AutoRAC system map

How the harness actually works

This is the operational path from official source text to a promoted RAC file. It includes the deterministic CI checks, semantic review gates, import discipline, run ledgers, provenance controls, and the new outer-loop autoresearch gate that only keeps prompt mutations when they beat a separate holdout.

Inspect encoding records in Atlas View broader technical stack

stages

recent guardrails

run ledgers

hard readiness gates

Recent proof points

The page is still curated, but these are current system facts rather than generic claims. They are the shortest path to what actually changed this week.

current as of wave 20 promotion

Accepted autoresearch mutation

The first kept outer-loop prompt mutation fixed claimant-or-partner disjunction handling and improved training score from 73.978 to 99.979 without regressing the separate final-review holdout.

55-case UK bulk wave promoted

Wave 20 cleared the real scale gate, then landed 55 new Pension Credit regulation 15 leaves in rac-uk.

Current UK corpus state

The promoted UK repo now has 146 RAC files, companion tests complete, zero embedded scalar violations, and zero numeric-occurrence backlog.

Replace-mode Atlas sync repaired

UK sync no longer depends on append-only recovery. Managed uk/legislation rows are deleted and republished in normal mode.

Pipeline explorer

Each stage below maps to something concrete in the current harness: code in the repo, files in the eval workspace, and a measurable pass-fail outcome.

Click a stage for details

Stage 01

Ingest official source text first

AutoRAC starts from real source documents, exact section identifiers, and copied slices. The source layer is never stubbed.

What happens here

Official PDFs, HTML, AKN, and exact source slices are pulled into the workspace before generation.

Benchmarks point at concrete section ids or slice ids, not free-form prompts.

This is what lets us say a later failure is a harness issue rather than a source-drift issue.

Checks and outputs

checks

Source id resolves to a concrete file or AKN eId

Exact slice copied into ./source.txt

No fake source placeholders

outputs

source.txtcontext-manifest.jsonallow-context bundle

source slice

<section eId="regulation-13">
  <num>13.</num>
  <content>
    <p>If the claimant's benefit is less than 10 pence per week,
    the amount shall be rounded up to 10 pence per week.</p>
  </content>
</section>

Failure pattern browser

These are real failure classes from recent UK, US, and Colorado work. Each one maps to a concrete harness or repo validator change. Atlas is where per-rule encoding records and agent logs are meant to be inspected, while autoresearch sits one level up and decides whether prompt-surface changes should be kept at all.

guardrails added from live failures

Repeated numbers must all appear as named scalars

failure mode

A source said 55% twice or repeated a threshold amount, but the RAC collapsed it into a single generic helper.

harness response

Numeric occurrence coverage now counts substantive source-number repeats and fails when the RAC under-represents them.

where it is enforced

numeric occurrence coverage

named scalar occurrence extraction

repo baseline audit in rac-uk and rac-us

recent example

UC taper rate and repeated UK benefit thresholds

Run ledger and provenance files

The harness now produces enough structured output that this section can eventually be generated directly from exported metadata. Right now the artifact list is curated, but it now uses concrete current examples from the accepted autoresearch run and the latest UK bulk promotion rather than abstract file shapes. The actual per-encoding logs and RAC records still belong in Atlas.

Open Atlas encoding views

suite-run.json

Live run state, runner metadata, progress counts, and current status. Updated while the suite is running.

These files make failed and interrupted runs auditable instead of disposable.

Promotion-time wave manifests connect generated files back to repo history, source snapshots, and provenance tier.

suite-run.json

{
  "name": "UK wave 20 bulk seed",
  "runner_backend": "codex",
  "status": "completed",
  "cases_total": 55,
  "cases_completed": 55,
  "started_at": "2026-04-08T00:12:44Z"
}