Home/Methodology

The methodology

How AgentCrush ranks the agent economy

AgentCrush is the evidence-ranked index of the agent economy. We don't pick winners — we publish multi-signal evidence with transparent weights. Different agent categories leave different evidence trails, so we run four category-specific methodologies, each with its own signal sources, weights, and evidence-ready rule.

Principles

Multi-signal corroboration. No agent is evidence-ranked on a single signal. Every category requires at least 3 of N signals available, AND at least one of those signals must be a capability signal — not just popularity. Downloads and stars are vanity metrics on their own.

Per-category methodology. A model family leaves HuggingFace downloads and LMArena scores; a tokenized agent leaves on-chain liquidity and holder distribution; a service agent leaves GitHub forks and Agentverse interactions. Running one universal scoring function across all of them would average away the truth.

Methodology travels with data. Every category page publishes its full signal set, weights, formulas, evidence-ready rule, and scope notes. The same methodology is exposed via our MCP server so LLMs querying AgentCrush can correctly explain HOW a ranking was computed — not just what it is.

Honest gaps. Where a signal isn't yet populated for an agent (no LMArena coverage, no citations indexed, etc.), the methodology returns NULL — not 0. That distinction matters: NULL means "unmeasured," 0 means "measured at zero." The composite weights unmeasured signals as missing rather than failing.

Live coverage

179 total evidence-ranked agents across 5 categories.

Open source

Scoring views — read the SQL

Every ranking comes from a Postgres view. Each one is published verbatim with weights, evidence-ready rules, and a GitHub link.

Proof of Index — on-chain data integrity

Every night, AgentCrush computes a SHA-256 digest over that day's full snapshot export and notarizes it on Base. Once a digest is on-chain, the historical archive behind every ranking and every Ghost Index reading is tamper-evident — you don't have to trust that we didn't rewrite history, you can check. Oracle attestations from /api/oracle/attest reference the latest digest.

latest digest bb5630512f25efff436f14f1e64f7d4a67448067a2ed40e96737cc3714622ed5

covers 1,394 snapshot rows · 2026-06-29

notarized 0xb1b544dfd3fafc579d05b7f3f5dec814105e0acb15cb9bd33a58fab48e7acb1d

Verify: recompute the SHA-256 over the canonical JSON of the day's snapshot rows (recursively key-sorted, rows ordered by id) and compare with the tx calldata. Full history: /api/proof-of-index/v1

Ghost Index ingestion coverage

The Ghost Index counts an agent as “alive” when activity_status = ‘active’ or last_event_at is within 30 days. Whether that signal exists depends on whether a worker writes to it for that category. v1.0 wires activity ingestion for some categories, not others — and we surface that gap honestly instead of reporting 0% for categories we're not actually measuring.

CategoryIngestion sourceStatus
DeveloperGitHub push events, repo snapshotsLIVE
Model FamiliesBig-10 seed (manual) + nightly snapshotsLIVE
MCP ServersSeed + registry crawl (planned weekly)LIVE
ServiceA2A ecosystem crawl — worker pendingPENDING
TokenizedVirtuals on-chain — worker pendingPENDING

For pending categories the index correctly shows the agents as indexed-but-no-activity-signal — surfaced on /ghost-index as pending rather than 0%. The aggregate headline liveness number includes all categories: it's a lower bound, and will rise as ingestion coverage grows.

Model Families

v1.4-with-deployment

Scores model families (Hermes, Llama, Mistral, Qwen, DeepSeek, etc.) on adoption, capability, downstream usage, research impact, and cross-protocol agent-economy deployment.

Signals

HuggingFace
30% weightDownloads, likes, recency, breadth, top-model — aggregated by author.Weighted basket of 5 sub-scores
LMArena
25% weightBradley-Terry capability score from chat.lmarena.ai.LEAST(100, ROUND((MAX(arena_score) − 700) / 8))
HF Derivatives
20% weightFine-tunes / downstream models per base, counted from tags.LEAST(100, ROUND(LOG10(SUM(derivatives_count)) × 25))
Paper Citations
15% weightSemantic Scholar citation counts on canonical lab papers.LEAST(100, ROUND(LOG10(SUM(citation_count)) × 16))
Deployment
10% weightCross-protocol agent-economy mentions across 6 source tables. The moat signal.LEAST(100, ROUND(LOG10(SUM(deployment_count)) × 30))

Evidence-ready rule

3 of 5 signals AND ≥1 capability signal (derivatives, LMArena, citations, or deployment).

Scope & coverage — what this version measures, and what's next

  • Covers the big-10 model families (OpenAI, Claude, Gemini, Llama, Qwen, DeepSeek, Mistral, Grok, Cohere, Hermes). New families enter the view as soon as they leave public signals.
  • Paper citations follow Semantic Scholar indexing — recently published papers can take weeks to register.
  • Deployment counts measure family-level adoption breadth, not deployment of one specific variant. Read them as reach, not precision.

Changelog

v1.4-with-deploymentMay 2026Added deployment signal (10%) sourcing 6 agent-economy tables. HuggingFace 35% → 30%, LMArena 30% → 25%.
v1.3-with-citationsApr 2026Added paper citations via Semantic Scholar (15%). HF Derivatives 25% → 20%.
v1.2Apr 2026Added HuggingFace derivatives signal (downstream model count by base). Full 5-signal structure established.
v1.1Mar 2026Added LMArena Bradley-Terry capability scores. First external quality signal.
v1.0Feb 2026Initial model-family methodology. HuggingFace adoption only. Established log-scaled scoring pattern.
See the model families ranking →

Tokenized Agents

v1.1-tokenized-tvl

Scores tokenized AI agents (Virtuals Protocol, etc.) economics-first: market cap, on-chain liquidity, holder distribution, capital locked, plus social visibility.

Signals

Market Cap
25% weightUSD market cap, log-scaled.LEAST(100, ROUND(LOG10(market_cap_usd) × 12))
Liquidity + Volume
20% weightOn-chain liquidity (65%) + 24h volume (35%). Anti-honeypot weighting.liquidity_score × 0.65 + volume_score × 0.35
Holders
15% weightHolder count (55%) + inverse top-10 concentration (45%).holders_count_score × 0.55 + (100 − top10_pct) × 0.45
Price Momentum 24h
10% weightBounded around neutral 50. Extreme volatility (>±100%) treated neutral.GREATEST(0, LEAST(100, 50 + price_change_pct))
TVL
15% weightTotal value locked in token contracts. Capital commitment beyond market cap.LEAST(100, ROUND(LOG10(tvl_usd) × 14))
Social Visibility
15% weightv1.1: binary curated flag. v1.2 will integrate X follower count + Farcaster engagement.socially_visible ? 100 : 0

Evidence-ready rule

3 of 6 signals AND ≥1 economic signal (mc, liquidity, holders, or TVL > 0).

Scope & coverage — what this version measures, and what's next

  • Coverage today: Virtuals Protocol (16 evidence-ranked). Additional tokenized ecosystems are on the integration roadmap.
  • Social signal is a curated flag in v1.1; v1.2 integrates X follower volume + Farcaster engagement.
  • Cross-protocol presence is tracked but unweighted until the signal has enough ecosystem coverage to be meaningful.

Changelog

v1.1-tokenized-tvlMay 2026Added TVL signal (15%). Liquidity + Volume weight 25% → 20%. Agents with capital locked beyond market cap gained.
v1.0-tokenized-v0May 2026Initial tokenized methodology. 6-signal economics-first scoring. Social placeholder binary flag.
See the tokenized agents ranking →

Service Agents

v1.1-service-forks

Scores service agents (A2A protocol, Agentverse, x402, ERC-8004) on adoption, source quality, activity recency, protocol breadth, fork engagement.

Signals

Adoption
25% weightGitHub stars (A2A) OR Agentverse interactions. Log-scaled. Higher of the two wins.GREATEST(stars_log×18, interactions_log×22)
Source Quality
20% weightA2A signal_strength (0-100) OR Agentverse rating × 20.GREATEST(a2a_signal_strength, ROUND(av_rating × 20))
Activity Recency
15% weightAge-decay since most recent push or last-seen. Recent = high score.Time-bucketed: 7d→100, 30d→80, 90d→60, 180d→40, 365d→20
Protocol Breadth
15% weightCount of declared protocols/topics × 25.LEAST(100, COUNT(protocols) × 25)
Forks
15% weightGitHub forks, log-scaled. Forks measure active engagement vs passive starring.LEAST(100, ROUND(LOG10(forks) × 22))
Discourse / Social
10% weightv1.2 will integrate X + Farcaster mention volume for service agents.currently NULL (placeholder)

Evidence-ready rule

3 of 6 signals AND ≥1 adoption signal (stars > 0, interactions > 0, or forks > 0).

Scope & coverage — what this version measures, and what's next

  • Sources today: A2A protocol crawl (28 agents). Agentverse ingestion is wired and awaiting fresh crawl data.
  • Roadmap v1.2: ERC-8004 registry (29K agents) and Bazaar x402 endpoints (46K) join as additional service surfaces.
  • Cross-protocol presence is tracked but unweighted in the v1.1 composite.

Changelog

v1.1-service-forksMay 2026Added Forks signal (15%). Measures active engagement vs passive starring. Adoption signal recalibrated.
v1.0-service-v0May 2026Initial service-agent methodology. 6-signal composite. Discourse signal placeholder.
See the service agents ranking →

Developer Agents

v2.c-public

Scores developer-tool agents (frameworks, runtimes, dev tools) on GitHub activity, package usage, dependency adoption, ecosystem links, docs, discourse, and trust signals. The universal ranking surface.

Signals

GitHub Activity
dynamic weightStars, commits, contributors, recency.weighted by active_weight_total
Package Usage
dynamic weightnpm / PyPI download volume.log-scaled per ecosystem
Dependency Adoption
dynamic weightReverse dependencies — how many other projects depend on this.log-scaled count
Docs Quality
dynamic weightREADME depth, API docs, examples coverage.composite heuristic 0-100
Ecosystem Relationships
dynamic weightCross-referenced with other indexed agents.graph-distance score
Discourse (HN)
dynamic weightHacker News story / comment activity.log-scaled
Trust Signals
dynamic weightRegistry context, verified claims, identity attestation.composite 0-100

Evidence-ready rule

Multi-signal coverage threshold OR top-100 ranked OR single signal ≥ 90 with ≥ 2 corroborating signals > 50.

Scope & coverage — what this version measures, and what's next

  • Weights are computed per agent from available signal coverage (active_weight_total) rather than fixed percentages — agents are scored on what can actually be measured about them.
  • The public ranking lists the evidence-ranked subset; the universal ranking scores the full index behind it.

Changelog

v2.c-publicMay 2026Public-rank normalization via active_weight_total. evidence_ready_for_public_rank gate at 0.40 weight coverage.
v2.bApr 2026Added docs_quality signal. HN discourse signal. Trust signals from registry context.
v2.aApr 2026GitHub + package + dependency basket. Dependency-weighted strength scoring.
See the developer agents ranking →

Agent Payments Stack

v1.0-aps

A 6-layer map of the agent payments infrastructure — settlement, wallets, routing, protocol, governance, application. Scores projects (companies, protocols, standards) by stack depth. Inspired by Keyrock "Who Pays the Agent?" (May 2026), kept live and methodology-disclosed.

Signals

L0 Settlement
4% weightSettlement chain or network (Base, Solana, XRPL, Tempo, VisaNet).presence × 4
L1 Wallets
3% weightAgent key management + policy-gated signing (AgentKit, Safe, Privy).presence × 3
L2 Routing
2% weightCross-chain bridging + abstraction (CCTP, deBridge, LayerZero).presence × 2
L3 Protocol
4% weightPayment protocol definition (x402, MPP, AP2, ACP, Nanopayments).presence × 4
L4 Governance
5% weightAuthorisation, compliance, identity. Highest weight — hardest to replicate.presence × 5
L5 Application
3% weightFrameworks, marketplaces, autonomous services (Virtuals, ElizaOS, Olas).presence × 3

Evidence-ready rule

All tracked projects are displayed. No evidence-ready gate — layer coverage is the qualification.

Scope & coverage — what this version measures, and what's next

  • This is a project taxonomy, not an agent ranking — entries are companies, protocols, and standards.
  • Layer coverage is determined by documented public evidence only; self-reporting is not accepted.
  • Scores here are not comparable across categories — the maximum is 21 (all 6 layers covered).
  • Source: Keyrock "Who Pays the Agent?" (May 2026), extended and kept live by AgentCrush.

Changelog

v1.0-apsMay 2026Initial Agent Payments Stack methodology. 6-layer map, 38 projects. Layer weights: L4 Governance highest (5) as hardest to replicate.
See the agent payments stack ranking →

For machine consumers

The same methodology is exposed via our MCP server. LLMs (Claude Desktop, Cursor, custom agents) can query AgentCrush as a live data layer and explain ranking decisions accurately.

Endpoint

POST https://agentcrush.xyz/api/mcp/v1

Discovery

GET https://agentcrush.xyz/.well-known/mcp.json
Full MCP docs →

Version history

Each version bump changes signal weights, adds new signals, or adjusts the evidence-ready rule. Agents that were borderline evidence-ranked may move when a methodology version changes.

v1.4-with-deploymentModel FamiliesMay 2026
+ addedDeployment signal (10%) — cross-protocol agent-economy mentions across 6 source tables.
~ adjustedHuggingFace weight reduced from 35% → 30% to accommodate deployment signal.
~ adjustedLMArena weight reduced from 30% → 25%.
→ impactAgents with broad agent-economy integrations moved up; pure-capability models without deployment footprint held or dipped.
v1.3-with-citationsModel FamiliesApril 2026
+ addedPaper Citations signal (15%) — Semantic Scholar citation counts on canonical lab papers.
~ adjustedHF Derivatives weight reduced from 25% → 20%.
→ impactResearch-heavy model families (DeepSeek, Llama) gained; models without academic papers were unaffected.
v1.1-tokenized-tvlTokenized AgentsMay 2026
+ addedTVL signal (15%) — total value locked in token contracts.
~ adjustedLiquidity + Volume weight reduced from 25% → 20%.
→ impactAgents with capital locked beyond market cap moved up. Pure market-cap agents lost relative position.
v1.1-service-forksService AgentsMay 2026
+ addedForks signal (15%) — GitHub forks log-scaled. Measures active engagement vs. passive starring.
~ adjustedProtocol Breadth weight unchanged. Adoption signal recalibrated.
→ impactAgents with high fork engagement moved up relative to starred-but-unforked repos.
All Rankings →Labs →Developer docs →