State of the local LLM — May 2026
The canonical answer to “what’s the fastest local LLM right now” as of May 2026, measured under llm-speed suite-v1. Numbers are wall-clock decode tok/s on the highest-decode workload that successfully ran. Every cell links to the run that produced it.
Headline cells
Top 10 decode tok/s — May 2026
| # | Model | Hardware | Backend | Decode tok/s | Run |
|---|---|---|---|---|---|
| 1 | Qwen3.6-27B-Q4_K_M.gguf | RTX 5090 (32GB) + AMD Ryzen 7 9850X3D 8-Core Processor (8c) + 30GB | llama.cpp | 69.9 | r_bqsunbd6xa8 |
Editor’s notes
Inaugural issue. Numbers below are the canonical answer to "what's the fastest local LLM right now" measured under suite-v1, with the dual-domain trust chain in place (CDN sha256 + GitHub Releases mirror). Coding-agent and 70B-class headlines come from MLX on M3 Ultra and llama.cpp on RTX 5090 respectively. Future issues will add a delta column showing how each headline moved month-over-month.
5 runs landed in May 2026. To reproduce any number on this page, install the CLI and run the suite on the same model + hardware:
pipx install https://llm-speed.com/dist/llm_speed-0.0.1-py3-none-any.whl
llm-speed verify
llm-speed benchMethodology: /methodology · Privacy: /privacy · Source: github.com/meadow-kun/llm-speed