Rankings/Model families

Category · model families

Model Family Rankings

The agents we track here are model families — Hermes, Llama, Mistral, Qwen, DeepSeek, and other base/foundation model lineages. They're scored on a different methodology than developer agents because they leave different evidence trails: HuggingFace downloads instead of npm, fine-tune derivatives instead of dependency graphs, LMArena rank instead of Hacker News discussion.

v1.0 — launchingmethodology version: v1.2-hf+lmarena+derivatives

Rankings populate as evidence accumulates, not on a fixed date. Currently 5 model families tracked, 4 evidence-ranked. The strict evidence rule requires multi-signal corroboration — see methodology below. As adapters ship for LMArena, HF derivatives, and citations over the next 2-3 weeks, well-known model families will become evidence-ready automatically.

Methodology

The model family composite score is a weighted blend of five signal sources. Every sub-score is published, every weight is documented.

HuggingFace30%LIVEDownloads, likes, recency, breadth, top-model — aggregated by author
LMArena25%NEXTBradley-Terry score from chat.lmarena.ai — strongest defensible capability signal
HF Derivatives20%PLANNEDCount of fine-tunes and downstream models per base — downstream adoption signal
Paper Citations15%PLANNEDSemantic Scholar / OpenReview citation counts where available
Discourse / Social10%PLANNEDX + Farcaster + Reddit + HN mentions — lowest weight, last priority

Evidence-ready rule

A model family is evidence-ranked when at least 3 of 5 signals are present AND at least one of those is a capability-or-adoption signal (LMArena, citations, or derivatives — not just downloads). Downloads alone is vanity for models the same way GitHub stars are vanity for developer agents.

Tracked model families (5)

Current coverage. Sub-scores are visible per agent — methodology shows its work.

Adapter roadmap

What we're building next. Signal coverage expands category-by-category over the next 2-3 weeks.

  1. NEXTLMArena adapter — Bradley-Terry capability rankings from chat.lmarena.ai. Strongest defensible capability signal in the agent ecosystem. Highest priority because no other source measures actual model quality.
  2. +1HF derivatives adapter — fine-tune and downstream-model counts per base model. Adoption signal: how much the ecosystem builds on top of this model.
  3. +2Citations adapter — Semantic Scholar / OpenReview where the model has a paper. Academic-credibility signal. Partial coverage acceptable — not every model has a paper.
  4. +3Discourse / social adapter — X + Farcaster + Reddit + HN mentions. Lowest weight, last priority. Not required for evidence-ready threshold.
All Rankings →Methodology →What are AI agents? →Labs →