Codestral vs Qwen3-Coder: which local coding model is faster?
Published 2026-07-02
Both are strong open coding models, but one decodes far faster. We measured Codestral-22B and Qwen3-Coder-30B-A3B on the same hardware, signed and reproducible. Every number links to its run.
Decode speed, same hardware
On an RTX 5090, Qwen3-Coder decodes 260 tok/s versus Codestral at 100 tok/s. On an M3 Ultra the gap holds: 112 tok/s versus 47 tok/s. Qwen3-Coder is about 2.5x faster on both.
Why the larger model wins
Counterintuitively, Qwen3-Coder is the bigger model by total parameters (30B versus 22B), yet it is far faster. It is a mixture-of-experts that activates only about 3B parameters per token, while Codestral is a dense 22B that activates all of them. Decode speed tracks the memory traffic of the active parameters, not the total size, so the small-MoE wins. The same effect shows up across the fastest local coders.
Which should you pick?
For raw speed, Qwen3-Coder-30B-A3B, and it fits any 24 GB card or a maxed Apple. Codestral stays a strong dense model some prefer for its output style, just at roughly 40% of the tok/s. Both are measured across every rig on the cheatsheet; check the fit for your card with the VRAM-fit checker, or see Qwen3-Coder across hardware.
Reproduce it, or add your rig
The same one-line install measures either model:
$ pipx install llm-speed && llm-speed benchNumbers as of July 2026; the linked runs and the cheatsheet always reflect current data.