AEP v1.0.2 — Agentic Performance Report
Generated: 2026-05-17T23:09:24.351679Z
Authority: operator 'wheres the measured performance of our agents with the new aep' 2026-05-17.
Composes with: AGENTIC-CAPABILITIES-V1.0.2-2026-05-17.md (Wave-046 substrate perf); this fills the per-AGENT gap.
TL;DR
- 10 canonical agents tracked (10 active last 7d; 10 active last 30d)
- 941 total ledger rows across all agents
- last 7d: 286 invocations
- last 30d: 941 invocations
- 1 HIGH-VETO mentions + 1 other VETO mentions across all agents
- 768 ledger rows with
artifact_path set (concrete artifacts produced)
Per-agent performance (ranked by total invocations)
| Agent | Total | Last 7d | Last 30d | First | Last | VETOs | HIGH-VETOs |
| forge | 252 | 51 | 252 | 2026-05-06 | 2026-05-17 | 1 | 0 |
| scribe | 116 | 37 | 116 | 2026-05-05 | 2026-05-17 | 0 | 0 |
| judge | 99 | 28 | 99 | 2026-05-06 | 2026-05-17 | 0 | 0 |
| warden | 87 | 21 | 87 | 2026-05-05 | 2026-05-17 | 0 | 0 |
| pathfinder | 82 | 34 | 82 | 2026-05-06 | 2026-05-17 | 1 | 1 |
| strategist | 77 | 31 | 77 | 2026-05-06 | 2026-05-17 | 0 | 0 |
| curator | 76 | 27 | 76 | 2026-05-06 | 2026-05-17 | 0 | 0 |
| adversary | 66 | 22 | 66 | 2026-05-06 | 2026-05-16 | 0 | 0 |
| scout | 61 | 22 | 61 | 2026-05-06 | 2026-05-16 | 0 | 0 |
| visual-judge | 25 | 13 | 25 | 2026-05-07 | 2026-05-16 | 0 | 0 |
Cross-agent aggregate metrics
Truth-tag distribution (aggregate)
STRONGLY PLAUSIBLE: 561 (59.6%)
PROVEN/RELIABLE: 366 (38.9%)
EXPERIMENTAL: 5 (0.5%)
SPECULATIVE FRONTIER: 3 (0.3%)
UNKNOWN: 3 (0.3%)
STRONGLY_PLAUSIBLE: 1 (0.1%)
PROVEN/RELIABLE on quota observation; STRONGLY PLAUSIBLE on LEGION protocol design: 1 (0.1%)
PROVEN/RELIABLE on rule + atomic-rewrite mechanism; STRONGLY PLAUSIBLE on stderr-signature detection (EXPERIMENTAL until first cascade trip): 1 (0.1%)
Outcome distribution (aggregate)
success: 785 (83.4%)
recovered: 147 (15.6%)
failed: 6 (0.6%)
UNKNOWN: 3 (0.3%)
Activity ranking (last 7 days)
- forge — 51 invocations
- scribe — 37 invocations
- pathfinder — 34 invocations
- strategist — 31 invocations
- judge — 28 invocations
- curator — 27 invocations
- scout — 22 invocations
- adversary — 22 invocations
- warden — 21 invocations
- visual-judge — 13 invocations
Per-agent detail
forge
- Total invocations: 252
- First: 2026-05-06T00:00:00
- Last: 2026-05-17T23:09:24
- Activity: 7d=51 / 30d=252
- Unique sessions: 117
- Artifact rows: 245
- VETO mentions: 1 (HIGH-VETO: 0)
- Wave participation count: 4
- Top 5 waves: wave-007×1, wave-019×1, wave-016×1, wave-014×1
- Top 5 cluster tags: lodestone×84, forge×73, crucible×40, implementation×31, forge-generator-role×25
- Notes quality proxy (length stats): mean=528 / median=340 / p95=1394
- Truth-tag distribution: {'PROVEN/RELIABLE': 140, 'STRONGLY PLAUSIBLE': 110, 'EXPERIMENTAL': 1, 'SPECULATIVE FRONTIER': 1}
- Outcome distribution: {'success': 243, 'recovered': 9}
scribe
- Total invocations: 116
- First: 2026-05-05T00:00:00
- Last: 2026-05-17T00:00:00
- Activity: 7d=37 / 30d=116
- Unique sessions: 63
- Artifact rows: 111
- VETO mentions: 0 (HIGH-VETO: 0)
- Wave participation count: 1
- Top 5 waves: wave-026×1
- Top 5 cluster tags: lesson×59, scribe×29, compounding-signal-mandatory×22, compounding×11, scribe-upgrade-floor-applied×11
- Notes quality proxy (length stats): mean=366 / median=257 / p95=1116
- Truth-tag distribution: {'PROVEN/RELIABLE': 60, 'STRONGLY PLAUSIBLE': 53, 'UNKNOWN': 1, 'PROVEN/RELIABLE on quota observation; STRONGLY PLAUSIBLE on LEGION protocol design': 1, 'PROVEN/RELIABLE on rule + atomic-rewrite mechanism; STRONGLY PLAUSIBLE on stderr-signature detection (EXPERIMENTAL until first cascade trip)': 1}
- Outcome distribution: {'success': 106, 'recovered': 10}
judge
- Total invocations: 99
- First: 2026-05-06T00:00:00
- Last: 2026-05-17T23:09:24
- Activity: 7d=28 / 30d=99
- Unique sessions: 47
- Artifact rows: 56
- VETO mentions: 0 (HIGH-VETO: 0)
- Wave participation count: 0
- Top 5 cluster tags: verify×49, scope-lock×23, judge×17, two-part-judge-structure×13, reflector-role-ace×9
- Notes quality proxy (length stats): mean=262 / median=198 / p95=625
- Truth-tag distribution: {'PROVEN/RELIABLE': 60, 'STRONGLY PLAUSIBLE': 37, 'UNKNOWN': 2}
- Outcome distribution: {'success': 85, 'recovered': 11, 'failed': 2, 'UNKNOWN': 1}
warden
- Total invocations: 87
- First: 2026-05-05T00:00:00
- Last: 2026-05-17T00:00:00
- Activity: 7d=21 / 30d=87
- Unique sessions: 61
- Artifact rows: 51
- VETO mentions: 0 (HIGH-VETO: 0)
- Wave participation count: 3
- Top 5 waves: wave-006×2, wave-005×1, wave-012×1
- Top 5 cluster tags: audit×51, warden×20, pattern:WARN-N-persistence×15, pattern:defect-tag-scan×13, marathon×11
- Notes quality proxy (length stats): mean=231 / median=160 / p95=481
- Truth-tag distribution: {'PROVEN/RELIABLE': 55, 'STRONGLY PLAUSIBLE': 32}
- Outcome distribution: {'success': 57, 'recovered': 29, 'failed': 1}
pathfinder
- Total invocations: 82
- First: 2026-05-06T00:00:00
- Last: 2026-05-17T23:09:24
- Activity: 7d=34 / 30d=82
- Unique sessions: 55
- Artifact rows: 55
- VETO mentions: 1 (HIGH-VETO: 1)
- Wave participation count: 6
- Top 5 waves: wave-019×2, wave-020×2, wave-002×1, wave-007×1, wave-008×1
- Top 5 cluster tags: decomposition×39, pathfinder×22, adversary-amendment-fold-matrix×19, scope-lock×14, disconfirmer-first×14
- Notes quality proxy (length stats): mean=246 / median=171 / p95=568
- Truth-tag distribution: {'STRONGLY PLAUSIBLE': 77, 'PROVEN/RELIABLE': 3, 'EXPERIMENTAL': 1, 'STRONGLY_PLAUSIBLE': 1}
- Outcome distribution: {'success': 74, 'recovered': 6, 'UNKNOWN': 2}
strategist
- Total invocations: 77
- First: 2026-05-06T00:00:00
- Last: 2026-05-17T23:09:24
- Activity: 7d=31 / 30d=77
- Unique sessions: 57
- Artifact rows: 50
- VETO mentions: 0 (HIGH-VETO: 0)
- Wave participation count: 3
- Top 5 waves: wave-001×2, wave-003×1, wave-005×1
- Top 5 cluster tags: framing×31, strategist×14, 5-weakest-assumption-hooks×10, scope-lock×7, tier-3×6
- Notes quality proxy (length stats): mean=234 / median=182 / p95=637
- Truth-tag distribution: {'STRONGLY PLAUSIBLE': 72, 'PROVEN/RELIABLE': 3, 'EXPERIMENTAL': 2}
- Outcome distribution: {'success': 70, 'recovered': 7}
curator
- Total invocations: 76
- First: 2026-05-06T00:00:00
- Last: 2026-05-17T00:00:00
- Activity: 7d=27 / 30d=76
- Unique sessions: 48
- Artifact rows: 69
- VETO mentions: 0 (HIGH-VETO: 0)
- Wave participation count: 3
- Top 5 waves: wave-001×3, wave-005×2, wave-006×1
- Top 5 cluster tags: curator×41, honest-counting×12, anti-source-laundering×10, sibling-78-discipline×10, marathon×9
- Notes quality proxy (length stats): mean=217 / median=192 / p95=411
- Truth-tag distribution: {'STRONGLY PLAUSIBLE': 51, 'PROVEN/RELIABLE': 25}
- Outcome distribution: {'success': 75, 'recovered': 1}
adversary
- Total invocations: 66
- First: 2026-05-06T00:00:00
- Last: 2026-05-16T00:00:00
- Activity: 7d=22 / 30d=66
- Unique sessions: 49
- Artifact rows: 50
- VETO mentions: 0 (HIGH-VETO: 0)
- Wave participation count: 0
- Top 5 cluster tags: pre-mortem×32, 5-weakest-assumption-hooks×17, premortem×14, adversary×13, ace-reflector-role×10
- Notes quality proxy (length stats): mean=280 / median=189 / p95=740
- Truth-tag distribution: {'STRONGLY PLAUSIBLE': 61, 'PROVEN/RELIABLE': 5}
- Outcome distribution: {'recovered': 57, 'success': 9}
scout
- Total invocations: 61
- First: 2026-05-06T00:00:00
- Last: 2026-05-16T00:00:00
- Activity: 7d=22 / 30d=61
- Unique sessions: 36
- Artifact rows: 56
- VETO mentions: 0 (HIGH-VETO: 0)
- Wave participation count: 0
- Top 5 cluster tags: research×23, anti-source-laundering×14, sibling-78-upgrade×11, powershell×7, scout×7
- Notes quality proxy (length stats): mean=348 / median=194 / p95=959
- Truth-tag distribution: {'STRONGLY PLAUSIBLE': 49, 'PROVEN/RELIABLE': 9, 'SPECULATIVE FRONTIER': 2, 'EXPERIMENTAL': 1}
- Outcome distribution: {'success': 59, 'recovered': 2}
visual-judge
- Total invocations: 25
- First: 2026-05-07T00:00:00
- Last: 2026-05-16T00:00:00
- Activity: 7d=13 / 30d=25
- Unique sessions: 24
- Artifact rows: 25
- VETO mentions: 0 (HIGH-VETO: 0)
- Wave participation count: 0
- Top 5 cluster tags: visual-judge×20, operator-pending×16, rubric×13, lodestone×13, forge-follow-up-amendments×10
- Notes quality proxy (length stats): mean=275 / median=200 / p95=601
- Truth-tag distribution: {'STRONGLY PLAUSIBLE': 19, 'PROVEN/RELIABLE': 6}
- Outcome distribution: {'recovered': 15, 'success': 7, 'failed': 3}
Honest framing per §69.5
- These metrics are ledger-derived (proxy for agent activity). They do NOT measure:
- Per-invocation latency (would need invocation-time-stamping; not currently captured)
- Token cost per agent (would need token-usage capture per Anthropic billing API)
- Multi-agent convergence rate (would need cross-agent dispatch correlation IDs)
- Operator-judged quality per agent output (subjective; requires operator rating loop)
- The metrics ARE empirical proxies for: activity volume, recency, VETO honor pattern, artifact production, doctrine ladder usage (truth-tag distribution).
STAGED for next session (per honest disclosure)
- Per-invocation latency capture — add
invoked_at + completed_at fields to ledger schema
- Token cost capture — instrument Agent dispatches to record token usage; tie to ledger via session_id
- Multi-agent convergence metric — when N agents fire on same wave, measure % agreement on TOP-1 frame
- Operator-rated quality loop — post-wave operator scores each agent's contribution 0-10, recorded in dedicated ledger