AEP v1.0.2 — Agentic Performance Report

Generated: 2026-05-17T23:09:24.351679Z

Authority: operator 'wheres the measured performance of our agents with the new aep' 2026-05-17.

Composes with: AGENTIC-CAPABILITIES-V1.0.2-2026-05-17.md (Wave-046 substrate perf); this fills the per-AGENT gap.

TL;DR

10 canonical agents tracked (10 active last 7d; 10 active last 30d)
941 total ledger rows across all agents
last 7d: 286 invocations
last 30d: 941 invocations
1 HIGH-VETO mentions + 1 other VETO mentions across all agents
768 ledger rows with artifact_path set (concrete artifacts produced)

Per-agent performance (ranked by total invocations)

Agent	Total	Last 7d	Last 30d	First	Last	VETOs	HIGH-VETOs
forge	252	51	252	2026-05-06	2026-05-17	1	0
scribe	116	37	116	2026-05-05	2026-05-17	0	0
judge	99	28	99	2026-05-06	2026-05-17	0	0
warden	87	21	87	2026-05-05	2026-05-17	0	0
pathfinder	82	34	82	2026-05-06	2026-05-17	1	1
strategist	77	31	77	2026-05-06	2026-05-17	0	0
curator	76	27	76	2026-05-06	2026-05-17	0	0
adversary	66	22	66	2026-05-06	2026-05-16	0	0
scout	61	22	61	2026-05-06	2026-05-16	0	0
visual-judge	25	13	25	2026-05-07	2026-05-16	0	0

Cross-agent aggregate metrics

Truth-tag distribution (aggregate)

STRONGLY PLAUSIBLE: 561 (59.6%)
PROVEN/RELIABLE: 366 (38.9%)
EXPERIMENTAL: 5 (0.5%)
SPECULATIVE FRONTIER: 3 (0.3%)
UNKNOWN: 3 (0.3%)
STRONGLY_PLAUSIBLE: 1 (0.1%)
PROVEN/RELIABLE on quota observation; STRONGLY PLAUSIBLE on LEGION protocol design: 1 (0.1%)
PROVEN/RELIABLE on rule + atomic-rewrite mechanism; STRONGLY PLAUSIBLE on stderr-signature detection (EXPERIMENTAL until first cascade trip): 1 (0.1%)

Outcome distribution (aggregate)

success: 785 (83.4%)
recovered: 147 (15.6%)
failed: 6 (0.6%)
UNKNOWN: 3 (0.3%)

Activity ranking (last 7 days)

forge — 51 invocations

scribe — 37 invocations

pathfinder — 34 invocations

strategist — 31 invocations

judge — 28 invocations

curator — 27 invocations

scout — 22 invocations

adversary — 22 invocations

warden — 21 invocations

visual-judge — 13 invocations

Per-agent detail

forge

Total invocations: 252
First: 2026-05-06T00:00:00
Last: 2026-05-17T23:09:24
Activity: 7d=51 / 30d=252
Unique sessions: 117
Artifact rows: 245
VETO mentions: 1 (HIGH-VETO: 0)
Wave participation count: 4
Top 5 waves: wave-007×1, wave-019×1, wave-016×1, wave-014×1
Top 5 cluster tags: lodestone×84, forge×73, crucible×40, implementation×31, forge-generator-role×25
Notes quality proxy (length stats): mean=528 / median=340 / p95=1394
Truth-tag distribution: {'PROVEN/RELIABLE': 140, 'STRONGLY PLAUSIBLE': 110, 'EXPERIMENTAL': 1, 'SPECULATIVE FRONTIER': 1}
Outcome distribution: {'success': 243, 'recovered': 9}

scribe

Total invocations: 116
First: 2026-05-05T00:00:00
Last: 2026-05-17T00:00:00
Activity: 7d=37 / 30d=116
Unique sessions: 63
Artifact rows: 111
VETO mentions: 0 (HIGH-VETO: 0)
Wave participation count: 1
Top 5 waves: wave-026×1
Top 5 cluster tags: lesson×59, scribe×29, compounding-signal-mandatory×22, compounding×11, scribe-upgrade-floor-applied×11
Notes quality proxy (length stats): mean=366 / median=257 / p95=1116
Truth-tag distribution: {'PROVEN/RELIABLE': 60, 'STRONGLY PLAUSIBLE': 53, 'UNKNOWN': 1, 'PROVEN/RELIABLE on quota observation; STRONGLY PLAUSIBLE on LEGION protocol design': 1, 'PROVEN/RELIABLE on rule + atomic-rewrite mechanism; STRONGLY PLAUSIBLE on stderr-signature detection (EXPERIMENTAL until first cascade trip)': 1}
Outcome distribution: {'success': 106, 'recovered': 10}

judge

Total invocations: 99
First: 2026-05-06T00:00:00
Last: 2026-05-17T23:09:24
Activity: 7d=28 / 30d=99
Unique sessions: 47
Artifact rows: 56
VETO mentions: 0 (HIGH-VETO: 0)
Wave participation count: 0
Top 5 cluster tags: verify×49, scope-lock×23, judge×17, two-part-judge-structure×13, reflector-role-ace×9
Notes quality proxy (length stats): mean=262 / median=198 / p95=625
Truth-tag distribution: {'PROVEN/RELIABLE': 60, 'STRONGLY PLAUSIBLE': 37, 'UNKNOWN': 2}
Outcome distribution: {'success': 85, 'recovered': 11, 'failed': 2, 'UNKNOWN': 1}

warden

Total invocations: 87
First: 2026-05-05T00:00:00
Last: 2026-05-17T00:00:00
Activity: 7d=21 / 30d=87
Unique sessions: 61
Artifact rows: 51
VETO mentions: 0 (HIGH-VETO: 0)
Wave participation count: 3
Top 5 waves: wave-006×2, wave-005×1, wave-012×1
Top 5 cluster tags: audit×51, warden×20, pattern:WARN-N-persistence×15, pattern:defect-tag-scan×13, marathon×11
Notes quality proxy (length stats): mean=231 / median=160 / p95=481
Truth-tag distribution: {'PROVEN/RELIABLE': 55, 'STRONGLY PLAUSIBLE': 32}
Outcome distribution: {'success': 57, 'recovered': 29, 'failed': 1}

pathfinder

Total invocations: 82
First: 2026-05-06T00:00:00
Last: 2026-05-17T23:09:24
Activity: 7d=34 / 30d=82
Unique sessions: 55
Artifact rows: 55
VETO mentions: 1 (HIGH-VETO: 1)
Wave participation count: 6
Top 5 waves: wave-019×2, wave-020×2, wave-002×1, wave-007×1, wave-008×1
Top 5 cluster tags: decomposition×39, pathfinder×22, adversary-amendment-fold-matrix×19, scope-lock×14, disconfirmer-first×14
Notes quality proxy (length stats): mean=246 / median=171 / p95=568
Truth-tag distribution: {'STRONGLY PLAUSIBLE': 77, 'PROVEN/RELIABLE': 3, 'EXPERIMENTAL': 1, 'STRONGLY_PLAUSIBLE': 1}
Outcome distribution: {'success': 74, 'recovered': 6, 'UNKNOWN': 2}

strategist

Total invocations: 77
First: 2026-05-06T00:00:00
Last: 2026-05-17T23:09:24
Activity: 7d=31 / 30d=77
Unique sessions: 57
Artifact rows: 50
VETO mentions: 0 (HIGH-VETO: 0)
Wave participation count: 3
Top 5 waves: wave-001×2, wave-003×1, wave-005×1
Top 5 cluster tags: framing×31, strategist×14, 5-weakest-assumption-hooks×10, scope-lock×7, tier-3×6
Notes quality proxy (length stats): mean=234 / median=182 / p95=637
Truth-tag distribution: {'STRONGLY PLAUSIBLE': 72, 'PROVEN/RELIABLE': 3, 'EXPERIMENTAL': 2}
Outcome distribution: {'success': 70, 'recovered': 7}

curator

Total invocations: 76
First: 2026-05-06T00:00:00
Last: 2026-05-17T00:00:00
Activity: 7d=27 / 30d=76
Unique sessions: 48
Artifact rows: 69
VETO mentions: 0 (HIGH-VETO: 0)
Wave participation count: 3
Top 5 waves: wave-001×3, wave-005×2, wave-006×1
Top 5 cluster tags: curator×41, honest-counting×12, anti-source-laundering×10, sibling-78-discipline×10, marathon×9
Notes quality proxy (length stats): mean=217 / median=192 / p95=411
Truth-tag distribution: {'STRONGLY PLAUSIBLE': 51, 'PROVEN/RELIABLE': 25}
Outcome distribution: {'success': 75, 'recovered': 1}

adversary

Total invocations: 66
First: 2026-05-06T00:00:00
Last: 2026-05-16T00:00:00
Activity: 7d=22 / 30d=66
Unique sessions: 49
Artifact rows: 50
VETO mentions: 0 (HIGH-VETO: 0)
Wave participation count: 0
Top 5 cluster tags: pre-mortem×32, 5-weakest-assumption-hooks×17, premortem×14, adversary×13, ace-reflector-role×10
Notes quality proxy (length stats): mean=280 / median=189 / p95=740
Truth-tag distribution: {'STRONGLY PLAUSIBLE': 61, 'PROVEN/RELIABLE': 5}
Outcome distribution: {'recovered': 57, 'success': 9}

scout

Total invocations: 61
First: 2026-05-06T00:00:00
Last: 2026-05-16T00:00:00
Activity: 7d=22 / 30d=61
Unique sessions: 36
Artifact rows: 56
VETO mentions: 0 (HIGH-VETO: 0)
Wave participation count: 0
Top 5 cluster tags: research×23, anti-source-laundering×14, sibling-78-upgrade×11, powershell×7, scout×7
Notes quality proxy (length stats): mean=348 / median=194 / p95=959
Truth-tag distribution: {'STRONGLY PLAUSIBLE': 49, 'PROVEN/RELIABLE': 9, 'SPECULATIVE FRONTIER': 2, 'EXPERIMENTAL': 1}
Outcome distribution: {'success': 59, 'recovered': 2}

visual-judge

Total invocations: 25
First: 2026-05-07T00:00:00
Last: 2026-05-16T00:00:00
Activity: 7d=13 / 30d=25
Unique sessions: 24
Artifact rows: 25
VETO mentions: 0 (HIGH-VETO: 0)
Wave participation count: 0
Top 5 cluster tags: visual-judge×20, operator-pending×16, rubric×13, lodestone×13, forge-follow-up-amendments×10
Notes quality proxy (length stats): mean=275 / median=200 / p95=601
Truth-tag distribution: {'STRONGLY PLAUSIBLE': 19, 'PROVEN/RELIABLE': 6}
Outcome distribution: {'recovered': 15, 'success': 7, 'failed': 3}

Honest framing per §69.5

These metrics are ledger-derived (proxy for agent activity). They do NOT measure:
Per-invocation latency (would need invocation-time-stamping; not currently captured)
Token cost per agent (would need token-usage capture per Anthropic billing API)
Multi-agent convergence rate (would need cross-agent dispatch correlation IDs)
Operator-judged quality per agent output (subjective; requires operator rating loop)
The metrics ARE empirical proxies for: activity volume, recency, VETO honor pattern, artifact production, doctrine ladder usage (truth-tag distribution).

STAGED for next session (per honest disclosure)

Per-invocation latency capture — add invoked_at + completed_at fields to ledger schema
Token cost capture — instrument Agent dispatches to record token usage; tie to ledger via session_id
Multi-agent convergence metric — when N agents fire on same wave, measure % agreement on TOP-1 frame
Operator-rated quality loop — post-wave operator scores each agent's contribution 0-10, recorded in dedicated ledger