Skip to content
llm-speed

Qwen2.5-Coder-32B-Instruct vs Codestral-22B-v0.1

Side-by-side decode tok/s, prefill tok/s, and TTFT for Qwen2.5-Coder-32B-Instruct and Codestral-22B-v0.1, sourced from community-submitted runs of the llm-speed suite. Every number on this page links back to the run it came from.

model
Qwen2.5-Coder-32B-Instruct
Qwen · 32B
View Qwen2.5-Coder-32B-Instruct page →
model
Codestral-22B-v0.1
Mistral · 22B
View Codestral-22B-v0.1 page →

Verdict

On the M3 Ultra (60-core GPU), Codestral-22B-v0.1 decodes at 47 tok/s versus 34 tok/s for Qwen2.5-Coder-32B-Instruct, 1.4× faster. That is the only hardware measured on both so far; the row below links to both source runs. Submit another to widen the comparison.

Hardware with data on both models

HardwareQwen2.5-Coder-32B-Instruct decodeCodestral-22B-v0.1 decodeΔSource runs
M3 Ultra (60-core GPU)34.48tok/s47.49tok/s-13.0r_721b4bls_oq · r_79dvtag5fd_

See also: Qwen2.5-Coder-32B-Instruct benchmarks · Codestral-22B-v0.1 benchmarks · All hardware · All models · Methodology