Blog

Local LLM speed, measured

Articles grounded in signed, reproducible benchmark runs — every number links to the submission that produced it.

Local LLM inference speed in 2026: what we've measured2026-07-02
Signed, reproducible decode tok/s across consumer GPUs and Apple Silicon — the fastest configs we've measured, why small-MoE coders punch above their weight, and what counts as fast enough.
Read
The fastest local coding models in 2026 (measured on an RTX 5090)2026-07-02
Signed, reproducible decode tok/s for the top local coding models on a single RTX 5090 — and why small-MoE coders decode ~4x faster than a dense 32B.
Read