AEP v1.0.2 — Agentic Performance Report

Generated: 2026-05-17T23:09:24.351679Z

Authority: operator 'wheres the measured performance of our agents with the new aep' 2026-05-17.

Composes with: AGENTIC-CAPABILITIES-V1.0.2-2026-05-17.md (Wave-046 substrate perf); this fills the per-AGENT gap.

TL;DR

Per-agent performance (ranked by total invocations)

AgentTotalLast 7dLast 30dFirstLastVETOsHIGH-VETOs
forge252512522026-05-062026-05-1710
scribe116371162026-05-052026-05-1700
judge9928992026-05-062026-05-1700
warden8721872026-05-052026-05-1700
pathfinder8234822026-05-062026-05-1711
strategist7731772026-05-062026-05-1700
curator7627762026-05-062026-05-1700
adversary6622662026-05-062026-05-1600
scout6122612026-05-062026-05-1600
visual-judge2513252026-05-072026-05-1600

Cross-agent aggregate metrics

Truth-tag distribution (aggregate)

Outcome distribution (aggregate)

Activity ranking (last 7 days)

  1. forge — 51 invocations
    1. scribe — 37 invocations
      1. pathfinder — 34 invocations
        1. strategist — 31 invocations
          1. judge — 28 invocations
            1. curator — 27 invocations
              1. scout — 22 invocations
                1. adversary — 22 invocations
                  1. warden — 21 invocations
                    1. visual-judge — 13 invocations
                    2. Per-agent detail

                      forge

                      • Total invocations: 252
                      • First: 2026-05-06T00:00:00
                      • Last: 2026-05-17T23:09:24
                      • Activity: 7d=51 / 30d=252
                      • Unique sessions: 117
                      • Artifact rows: 245
                      • VETO mentions: 1 (HIGH-VETO: 0)
                      • Wave participation count: 4
                      • Top 5 waves: wave-007×1, wave-019×1, wave-016×1, wave-014×1
                      • Top 5 cluster tags: lodestone×84, forge×73, crucible×40, implementation×31, forge-generator-role×25
                      • Notes quality proxy (length stats): mean=528 / median=340 / p95=1394
                      • Truth-tag distribution: {'PROVEN/RELIABLE': 140, 'STRONGLY PLAUSIBLE': 110, 'EXPERIMENTAL': 1, 'SPECULATIVE FRONTIER': 1}
                      • Outcome distribution: {'success': 243, 'recovered': 9}

                      scribe

                      • Total invocations: 116
                      • First: 2026-05-05T00:00:00
                      • Last: 2026-05-17T00:00:00
                      • Activity: 7d=37 / 30d=116
                      • Unique sessions: 63
                      • Artifact rows: 111
                      • VETO mentions: 0 (HIGH-VETO: 0)
                      • Wave participation count: 1
                      • Top 5 waves: wave-026×1
                      • Top 5 cluster tags: lesson×59, scribe×29, compounding-signal-mandatory×22, compounding×11, scribe-upgrade-floor-applied×11
                      • Notes quality proxy (length stats): mean=366 / median=257 / p95=1116
                      • Truth-tag distribution: {'PROVEN/RELIABLE': 60, 'STRONGLY PLAUSIBLE': 53, 'UNKNOWN': 1, 'PROVEN/RELIABLE on quota observation; STRONGLY PLAUSIBLE on LEGION protocol design': 1, 'PROVEN/RELIABLE on rule + atomic-rewrite mechanism; STRONGLY PLAUSIBLE on stderr-signature detection (EXPERIMENTAL until first cascade trip)': 1}
                      • Outcome distribution: {'success': 106, 'recovered': 10}

                      judge

                      • Total invocations: 99
                      • First: 2026-05-06T00:00:00
                      • Last: 2026-05-17T23:09:24
                      • Activity: 7d=28 / 30d=99
                      • Unique sessions: 47
                      • Artifact rows: 56
                      • VETO mentions: 0 (HIGH-VETO: 0)
                      • Wave participation count: 0
                      • Top 5 cluster tags: verify×49, scope-lock×23, judge×17, two-part-judge-structure×13, reflector-role-ace×9
                      • Notes quality proxy (length stats): mean=262 / median=198 / p95=625
                      • Truth-tag distribution: {'PROVEN/RELIABLE': 60, 'STRONGLY PLAUSIBLE': 37, 'UNKNOWN': 2}
                      • Outcome distribution: {'success': 85, 'recovered': 11, 'failed': 2, 'UNKNOWN': 1}

                      warden

                      • Total invocations: 87
                      • First: 2026-05-05T00:00:00
                      • Last: 2026-05-17T00:00:00
                      • Activity: 7d=21 / 30d=87
                      • Unique sessions: 61
                      • Artifact rows: 51
                      • VETO mentions: 0 (HIGH-VETO: 0)
                      • Wave participation count: 3
                      • Top 5 waves: wave-006×2, wave-005×1, wave-012×1
                      • Top 5 cluster tags: audit×51, warden×20, pattern:WARN-N-persistence×15, pattern:defect-tag-scan×13, marathon×11
                      • Notes quality proxy (length stats): mean=231 / median=160 / p95=481
                      • Truth-tag distribution: {'PROVEN/RELIABLE': 55, 'STRONGLY PLAUSIBLE': 32}
                      • Outcome distribution: {'success': 57, 'recovered': 29, 'failed': 1}

                      pathfinder

                      • Total invocations: 82
                      • First: 2026-05-06T00:00:00
                      • Last: 2026-05-17T23:09:24
                      • Activity: 7d=34 / 30d=82
                      • Unique sessions: 55
                      • Artifact rows: 55
                      • VETO mentions: 1 (HIGH-VETO: 1)
                      • Wave participation count: 6
                      • Top 5 waves: wave-019×2, wave-020×2, wave-002×1, wave-007×1, wave-008×1
                      • Top 5 cluster tags: decomposition×39, pathfinder×22, adversary-amendment-fold-matrix×19, scope-lock×14, disconfirmer-first×14
                      • Notes quality proxy (length stats): mean=246 / median=171 / p95=568
                      • Truth-tag distribution: {'STRONGLY PLAUSIBLE': 77, 'PROVEN/RELIABLE': 3, 'EXPERIMENTAL': 1, 'STRONGLY_PLAUSIBLE': 1}
                      • Outcome distribution: {'success': 74, 'recovered': 6, 'UNKNOWN': 2}

                      strategist

                      • Total invocations: 77
                      • First: 2026-05-06T00:00:00
                      • Last: 2026-05-17T23:09:24
                      • Activity: 7d=31 / 30d=77
                      • Unique sessions: 57
                      • Artifact rows: 50
                      • VETO mentions: 0 (HIGH-VETO: 0)
                      • Wave participation count: 3
                      • Top 5 waves: wave-001×2, wave-003×1, wave-005×1
                      • Top 5 cluster tags: framing×31, strategist×14, 5-weakest-assumption-hooks×10, scope-lock×7, tier-3×6
                      • Notes quality proxy (length stats): mean=234 / median=182 / p95=637
                      • Truth-tag distribution: {'STRONGLY PLAUSIBLE': 72, 'PROVEN/RELIABLE': 3, 'EXPERIMENTAL': 2}
                      • Outcome distribution: {'success': 70, 'recovered': 7}

                      curator

                      • Total invocations: 76
                      • First: 2026-05-06T00:00:00
                      • Last: 2026-05-17T00:00:00
                      • Activity: 7d=27 / 30d=76
                      • Unique sessions: 48
                      • Artifact rows: 69
                      • VETO mentions: 0 (HIGH-VETO: 0)
                      • Wave participation count: 3
                      • Top 5 waves: wave-001×3, wave-005×2, wave-006×1
                      • Top 5 cluster tags: curator×41, honest-counting×12, anti-source-laundering×10, sibling-78-discipline×10, marathon×9
                      • Notes quality proxy (length stats): mean=217 / median=192 / p95=411
                      • Truth-tag distribution: {'STRONGLY PLAUSIBLE': 51, 'PROVEN/RELIABLE': 25}
                      • Outcome distribution: {'success': 75, 'recovered': 1}

                      adversary

                      • Total invocations: 66
                      • First: 2026-05-06T00:00:00
                      • Last: 2026-05-16T00:00:00
                      • Activity: 7d=22 / 30d=66
                      • Unique sessions: 49
                      • Artifact rows: 50
                      • VETO mentions: 0 (HIGH-VETO: 0)
                      • Wave participation count: 0
                      • Top 5 cluster tags: pre-mortem×32, 5-weakest-assumption-hooks×17, premortem×14, adversary×13, ace-reflector-role×10
                      • Notes quality proxy (length stats): mean=280 / median=189 / p95=740
                      • Truth-tag distribution: {'STRONGLY PLAUSIBLE': 61, 'PROVEN/RELIABLE': 5}
                      • Outcome distribution: {'recovered': 57, 'success': 9}

                      scout

                      • Total invocations: 61
                      • First: 2026-05-06T00:00:00
                      • Last: 2026-05-16T00:00:00
                      • Activity: 7d=22 / 30d=61
                      • Unique sessions: 36
                      • Artifact rows: 56
                      • VETO mentions: 0 (HIGH-VETO: 0)
                      • Wave participation count: 0
                      • Top 5 cluster tags: research×23, anti-source-laundering×14, sibling-78-upgrade×11, powershell×7, scout×7
                      • Notes quality proxy (length stats): mean=348 / median=194 / p95=959
                      • Truth-tag distribution: {'STRONGLY PLAUSIBLE': 49, 'PROVEN/RELIABLE': 9, 'SPECULATIVE FRONTIER': 2, 'EXPERIMENTAL': 1}
                      • Outcome distribution: {'success': 59, 'recovered': 2}

                      visual-judge

                      • Total invocations: 25
                      • First: 2026-05-07T00:00:00
                      • Last: 2026-05-16T00:00:00
                      • Activity: 7d=13 / 30d=25
                      • Unique sessions: 24
                      • Artifact rows: 25
                      • VETO mentions: 0 (HIGH-VETO: 0)
                      • Wave participation count: 0
                      • Top 5 cluster tags: visual-judge×20, operator-pending×16, rubric×13, lodestone×13, forge-follow-up-amendments×10
                      • Notes quality proxy (length stats): mean=275 / median=200 / p95=601
                      • Truth-tag distribution: {'STRONGLY PLAUSIBLE': 19, 'PROVEN/RELIABLE': 6}
                      • Outcome distribution: {'recovered': 15, 'success': 7, 'failed': 3}

                      Honest framing per §69.5

                      • These metrics are ledger-derived (proxy for agent activity). They do NOT measure:
                      • Per-invocation latency (would need invocation-time-stamping; not currently captured)
                      • Token cost per agent (would need token-usage capture per Anthropic billing API)
                      • Multi-agent convergence rate (would need cross-agent dispatch correlation IDs)
                      • Operator-judged quality per agent output (subjective; requires operator rating loop)
                      • The metrics ARE empirical proxies for: activity volume, recency, VETO honor pattern, artifact production, doctrine ladder usage (truth-tag distribution).

                      STAGED for next session (per honest disclosure)

                      • Per-invocation latency capture — add invoked_at + completed_at fields to ledger schema
                      • Token cost capture — instrument Agent dispatches to record token usage; tie to ledger via session_id
                      • Multi-agent convergence metric — when N agents fire on same wave, measure % agreement on TOP-1 frame
                      • Operator-rated quality loop — post-wave operator scores each agent's contribution 0-10, recorded in dedicated ledger