Model Family Rankings

The agents we track here are model families — Hermes, Llama, Mistral, Qwen, DeepSeek, and other base/foundation model lineages. They're scored on a different methodology than developer agents because they leave different evidence trails: HuggingFace downloads instead of npm, fine-tune derivatives instead of dependency graphs, LMArena rank instead of Hacker News discussion.

v1.0 — launchingmethodology version: v1.2-hf+lmarena+derivatives

Rankings populate as evidence accumulates, not on a fixed date. Currently 5 model families tracked, 4 evidence-ranked. The strict evidence rule requires multi-signal corroboration — see methodology below. As adapters ship for LMArena, HF derivatives, and citations over the next 2-3 weeks, well-known model families will become evidence-ready automatically.

Methodology

The model family composite score is a weighted blend of five signal sources. Every sub-score is published, every weight is documented.

HuggingFace30%LIVEDownloads, likes, recency, breadth, top-model — aggregated by author

LMArena25%NEXTBradley-Terry score from chat.lmarena.ai — strongest defensible capability signal

HF Derivatives20%PLANNEDCount of fine-tunes and downstream models per base — downstream adoption signal

Paper Citations15%PLANNEDSemantic Scholar / OpenReview citation counts where available

Discourse / Social10%PLANNEDX + Farcaster + Reddit + HN mentions — lowest weight, last priority

Evidence-ready rule

A model family is evidence-ranked when at least 3 of 5 signals are present AND at least one of those is a capability-or-adoption signal (LMArena, citations, or derivatives — not just downloads). Downloads alone is vanity for models the same way GitHub stars are vanity for developer agents.

Tracked model families (5)

Current coverage. Sub-scores are visible per agent — methodology shows its work.

#AgentHFDERLMACITSOCScore

Qwenevidence-ranked

HF: Qwen · 278 models · 464,328,470 downloads

1007596⏳⏳69

Geminievidence-ranked

HF: google · 193 models · 128,927,656 downloads

996398⏳⏳67

DeepSeekevidence-ranked

HF: deepseek-ai · 51 models · 29,298,419 downloads

935094⏳⏳61

Llamaevidence-ranked

HF: meta-llama · 38 models · 35,327,620 downloads

785873⏳⏳53

—

hermes-agentindexed

HF: NousResearch · 25 models · 945,683 downloads

6933⏳⏳⏳27

Adapter roadmap

What we're building next. Signal coverage expands category-by-category over the next 2-3 weeks.

NEXTLMArena adapter — Bradley-Terry capability rankings from chat.lmarena.ai. Strongest defensible capability signal in the agent ecosystem. Highest priority because no other source measures actual model quality.
+1HF derivatives adapter — fine-tune and downstream-model counts per base model. Adoption signal: how much the ecosystem builds on top of this model.
+2Citations adapter — Semantic Scholar / OpenReview where the model has a paper. Academic-credibility signal. Partial coverage acceptable — not every model has a paper.
+3Discourse / social adapter — X + Farcaster + Reddit + HN mentions. Lowest weight, last priority. Not required for evidence-ready threshold.

All Rankings →Methodology →What are AI agents? →Labs →