About
About llm-speed
llm-speed is a community benchmark for LLM inference speed. One CLI, one methodology, real numbers across hosted APIs, consumer GPUs, Apple Silicon, and prosumer rigs.
Why this exists
Artificial Analysis covers hosted APIs and datacenter accelerators; MLPerf is enterprise rigs; r/LocalLLaMA is folklore in comment threads. Nobody owns the union — consumer-local plus hosted-API numbers under one protocol, with a permalink per result. That's what this is.
Who runs it
One maintainer, publishing under the pseudonym meadow-kun. Solo project, open source, no company behind it. Issues and DMs welcome via GitHub. What you get for trusting me with a submission: every run is signed by an Ed25519 keypair on your machine, the canonical bytes are auditable from the run page, the suite version is pinned, and the entire pipeline — CLI, ingest, web — is Apache-2.0 in one public repo. Disputes happen in public GitHub issues against the run id. Numbers belong to their submitters, not to me.
How it works
- Install the CLI. It detects your hardware and the inference backends you have available.
- Run the workload suite. You get tok/s, TTFT, and latency numbers in about a minute.
- Each result is signed by an Ed25519 keypair on your machine and posted to a public, permalinked run page.
- Read the methodology for the full workload spec and how disputes work.
Related work
These are the resources we've learned from and continue to point people toward:
- Artificial Analysis — hosted-API and datacenter accelerator benchmarks.
- LocalScore — open local-LLM benchmarks (Mozilla Builders).
- r/LocalLLaMA — the community where most of this knowledge lives.
Open source
The CLI, the methodology, and this site are open source under Apache-2.0. Submit issues or ideas via GitHub. Results belong to the people who submitted them.