Skip to content
llm-speed

qwen3-coder on M4 Max (40-core GPU) + 128GB unified

M4 Max (40-core GPU) + 128GB unifiedM4 Max (40-core GPU) + 128GB unified
suite suite-v1
cli 0.0.3
signedya0OvDcfH4…
Embed badgesubmitted Jun 27, 2026

Workload results

WorkloadBackendModeldecode tok/sprefill tok/sTTFTp50p95
chat-shortollama@0.30.11qwen3-coderQ4_K_M55.45tok/s675.2tok/s163ms15.7ms37.3ms
chat-longollama@0.30.11qwen3-coderQ4_K_M57.86tok/s28,422.4tok/s111ms17.0ms19.8ms
concurrent-decodeollama@0.30.11qwen3-coderQ4_K_M67.01tok/s14.0ms19.7ms
agent-trace
error
ollama@0.30.11qwen3-coderQ4_K_M

Reproduce on your machine

Same workload, same model, signed at your rig. The exact command that produced this run:

$ pipx install llm-speed && llm-speed bench --model 'qwen3-coder' --workload 'concurrent-decode'

Runs in about a minute. Your number lands on the leaderboard signed and linkable. How it's measured.

Embed this run

Drop the badge into a README, blog post, or signature. Each render is a backlink to the signed result.

llm-speed: 67.0 tok/s on M4 Max (40-core GPU) (qwen3-coder)
[![llm-speed: 67.0 tok/s on M4 Max (40-core GPU) (qwen3-coder)](https://llm-speed.com/badge/r_1o8q4lhgj88.svg)](https://llm-speed.com/r/r_1o8q4lhgj88)

Related benchmarks

Provenance

Run ID
r_1o8q4lhgj88
Fingerprint hash
275ecb2b79296aab
Public key
ya0OvDcfH4La0mEhEhM8iESwvi9/MZz9uibPMNfpovE=
Received
2026-06-27 15:38:31