About
About llm-speed
llm-speed is a community benchmark for LLM inference speed. One CLI, one methodology, real numbers across hosted APIs, consumer GPUs, Apple Silicon, and prosumer rigs.
How it works
- Install the CLI. It detects your hardware and the inference backends you have available.
- Run the workload suite. You get tok/s, TTFT, and latency numbers in about a minute.
- Each result is signed by an Ed25519 keypair on your machine and posted to a public, permalinked run page.
- Read the methodology for the full workload spec and how disputes work.
Related work
These are the resources we've learned from and continue to point people toward:
- Artificial Analysis — hosted-API and datacenter accelerator benchmarks.
- LocalScore — open local-LLM benchmarks (Mozilla Builders).
- r/LocalLLaMA — the community where most of this knowledge lives.
Open source
The CLI, the methodology, and this site are open source under Apache-2.0. Submit issues or ideas via GitHub. Results belong to the people who submitted them.