Weekly/W21 · May 18–24, 2026

Weekly Digest
W21 · May 18–24, 2026
Week 21 was an infrastructure week. The index now runs four category rankings side by side — model families, tokenized agents, service agents, and developer agents — covering 1,338 indexed agents, of which 138 are evidence-ranked (7 model families, 16 tokenized, 28 service, 87 developer). The shift this week wasn't a leaderboard shake-up; it was making each ranking explain itself.
Two things landed. First, the Agent Payments Stack index went live — a neutral six-layer map of who actually covers what in agent payments, from settlement to application. Coinbase and Stripe tie at five of six layers; Circle sits at four. Second, we shipped a confidence tier on scores: every ranked agent now carries a signal-coverage grade (high / medium / low / provisional), so a score built on five signals reads differently from one built on three. The principle is simple — a number without its sample size is a guess in a suit.
Where the rankings stand
Model Families
v1.4 · full ranking →Tokenized
aixbt · 83
16 ranked · v1.1
Service
a2aproject/A2A · 77
28 ranked · v1.1
Developer
87 evidence-ranked
1,288 tracked · v2.c
Standings as of May 24, 2026. Every figure is live at the public /api/rankings/*/llm-summary endpoints. Scores shift as upstream signals (HuggingFace, LMArena, on-chain) refresh.
Signal highlights
Multi-signal scoring inverts single-source rankings. Qwen leads the model-family composite at 83, but no single signal crowns it: HuggingFace downloads, LMArena Elo, citations, and cross-protocol deployment each point to a different leader. The composite is the only honest ranking — and the unique thing only AgentCrush computes.
Confidence tiers shipped. Six of seven model families now grade high (full five-signal coverage); Hermes grades medium (four of five). The score and its certainty now travel together.
Payments-stack coverage is concentrated. Across the 38 projects in the new Agent Payments Stack index, only two — Coinbase and Stripe — span five of the six layers. The rest specialize. Breadth is rare.
This week in data
1,338
Agents indexed
138
Evidence-ranked
4
Category rankings
7
x402 endpoints