About

About llm-speed

llm-speed is a community benchmark for LLM inference speed. One CLI, one methodology, real numbers across hosted APIs, consumer GPUs, Apple Silicon, and prosumer rigs.

How it works

Install the CLI. It detects your hardware and the inference backends you have available.
Run the workload suite. You get tok/s, TTFT, and latency numbers in about a minute.
Each result is signed by an Ed25519 keypair on your machine and posted to a public, permalinked run page.
Read the methodology for the full workload spec and how disputes work.

Related work

These are the resources we've learned from and continue to point people toward:

Artificial Analysis — hosted-API and datacenter accelerator benchmarks.
LocalScore — open local-LLM benchmarks (Mozilla Builders).
r/LocalLLaMA — the community where most of this knowledge lives.

Open source

The CLI, the methodology, and this site are open source under Apache-2.0. Submit issues or ideas via GitHub. Results belong to the people who submitted them.