Labs/Findings/Methodology v1 Launch

AgentCrush Labs · Findings · 2026-05-16

What the agent economy looks like across 5+ signals

On May 16, 2026, AgentCrush shipped four category-specific scoring methodologies. Different agent categories leave different evidence trails, so we measure them differently. This page collects the most striking findings from that launch — the cases where multi-signal scoring produced a different answer than any single source would have given.

By the numbers

Total tracked

1,338

agents indexed

Evidence-ranked

26

across 4 categories

Categories live

4

model_family · tokenized · service · developer

MCP tools

7

machine-readable methodology

Finding #1 — headline

Single-source rankings invert under multi-signal scoring

The HuggingFace leader isn't the LMArena leader. The LMArena leader isn't the citation leader. The citation leader isn't the deployment leader. Each signal answers a different question — and when we combine them with documented weights, the resulting ranking is different from any of them taken alone.

HuggingFace #1

Alibaba Qwen

score 95

LMArena #1

Google Gemini

BT 1484

Derivatives #1

Alibaba Qwen

1,046

Citations #1

Meta Llama

51,449

Deployments #1

Google Gemini

145

Composite #1 (the agent that maximizes across all 5 weighted signals): Alibaba Qwen at score 83.

Finding #2 — the admission test

Hermes admitted at #, ranks last — by design

NousResearch Hermes is a beloved community model. It would top a vibe-based ranking. Our methodology admitted it — and ranked it last among model families.

The rule is: 3 of 5 signals must be present, AND at least one must be a capability signal (derivatives, LMArena, citations, or cross-protocol deployment). For weeks Hermes had only 2 signals (HuggingFace + a thin derivatives footprint) — not evidence-ready. When we added paper citations and deployment scanning to the methodology, Hermes earned its third signal:

HuggingFace

LMArena

Derivatives

Citations

Deployment

Composite: . Hermes earned admission to the ranking via citations + deployment, but its raw footprint (HF downloads, no LMArena coverage, modest derivatives) keeps it at the back. This is the methodology working as designed: admit on evidence, rank on weight. No manual override.

Finding #3 — the honeypot test

Market cap alone is not a ranking

$TIBBIR has the largest USD market cap in the tokenized index ($101.6M). It does not rank #1.

AIXBT does, at composite 80. Why: the methodology weights on-chain liquidity (anti-honeypot), capital locked in token contracts (TVL = real commitment), and holder distribution. A high market cap with thin liquidity gets penalized, not rewarded. AgentCrush surfaced one Virtuals token at $380M market cap with $5K liquidity — exactly the pattern we built the methodology to demote.

AgentMCLiqTVLScore
AIXBT $AIXBT$18.4M$761K$378K80
Ribbita $TIBBIR$101.6M$1896K$935K73
G.A.M.E $GAME$4.1M$1814K$895K65
Luna $LUNA$4.5M$1402K$693K65
717ai $WIRE$1.8M$256K$126K62

Finding #4 — forks beat stars

Active engagement beats passive interest

For service agents (callable endpoints — A2A protocol, Agentverse, x402, ERC-8004), we use forks as a stronger adoption signal than stars. Anyone can star a repo. Forking means you're going to use or modify it.

A2A leads at composite 77 on 24,535 stars and 2,486 forks — a fork ratio that signals real-world deployment, not just bookmarking.

Why this matters

Read the full methodology

Every weight, every formula, every limitation is published.

All Labs →Methodology →Blog →