Skip to content
llm-speed

Cheatsheet

LLM speed cheatsheet

Best decode tokens-per-second per (model × hardware) tuple measured by llm-speed under suite-v1. Numbers are wall-clock, batch size 1 unless the workload says otherwise. Each row links to the canonical run page; cite as llm-speed.com/r/<id>.

11 (model × hardware) combinations · sorted by decode tok/s · suite-v1

How to read this table

Each row is the highest decode tokens-per-second measured for a unique (model, hardware) pair. When more than one workload was run, we pick the workload that produced the headline number. Lower-tps duplicates from the same machine are not shown; click through to the run page for the full per-workload breakdown.

decode tok/s is wall-clock streaming throughput (memory-bandwidth-bound). prefill tok/s is prompt ingestion throughput (compute-bound). TTFT is time-to-first-token in milliseconds. See /glossary for definitions and /methodology for the workload spec.