Published 2026-05-04

State of the local LLM — May 2026

Name: Top decode tok/s by (model × hardware) — May 2026
Creator: llm-speed
License: https://www.apache.org/licenses/LICENSE-2.0

The canonical answer to “what’s the fastest local LLM right now” as of May 2026, measured under llm-speed suite-v1. Numbers are wall-clock decode tok/s on the highest-decode workload that successfully ran. Every cell links to the run that produced it.

Headline cells

Fastest local model

69.9tok/s

Qwen3.6-27B-Q4_K_M.gguf on RTX 5090 (32GB) + AMD Ryzen 7 9850X3D 8-Core Processor (8c) + 30GB (llama.cpp)

View run →

Fastest local 70B+ class

—

no 70B-class submissions this month

Fastest local coding agent

—

no coder-family submissions this month

Top 10 decode tok/s — May 2026

#	Model	Hardware	Backend	Decode tok/s	Run
1	Qwen3.6-27B-Q4_K_M.gguf	RTX 5090 (32GB) + AMD Ryzen 7 9850X3D 8-Core Processor (8c) + 30GB	llama.cpp	69.9	r_bqsunbd6xa8

Editor’s notes

Inaugural issue. Numbers below are the canonical answer to "what's the fastest local LLM right now" measured under suite-v1, with the dual-domain trust chain in place (CDN sha256 + GitHub Releases mirror). Coding-agent and 70B-class headlines come from MLX on M3 Ultra and llama.cpp on RTX 5090 respectively. Future issues will add a delta column showing how each headline moved month-over-month.

5 runs landed in May 2026. To reproduce any number on this page, install the CLI and run the suite on the same model + hardware:

pipx install https://llm-speed.com/dist/llm_speed-0.0.1-py3-none-any.whl
llm-speed verify
llm-speed bench

Methodology: /methodology · Privacy: /privacy · Source: github.com/meadow-kun/llm-speed