Skip to content
llm-speed

qwen3-coder on M4 Max (40-core GPU) + 128GB unified

M4 Max (40-core GPU) + 128GB unifiedM4 Max (40-core GPU) + 128GB unified
suite suite-v1
cli 0.0.3
signedya0OvDcfH4…
Embed badgesubmitted Jun 27, 2026

Workload results

WorkloadBackendModeldecode tok/sprefill tok/sTTFTp50p95
chat-shortollama@0.30.11qwen3-coderQ4_K_M94.51tok/s65.06tok/s1,691ms8.9ms18.0ms
chat-longollama@0.30.11qwen3-coderQ4_K_M97.35tok/s1,397.3tok/s2,252ms10.1ms10.6ms
concurrent-decodeollama@0.30.11qwen3-coderQ4_K_M109.9tok/s9.1ms9.5ms
agent-trace
error
ollama@0.30.11qwen3-coderQ4_K_M

Reproduce on your machine

Same workload, same model, signed at your rig. The exact command that produced this run:

$ pipx install llm-speed && llm-speed bench --model 'qwen3-coder' --workload 'concurrent-decode'

Runs in about a minute. Your number lands on the leaderboard signed and linkable. How it's measured.

Embed this run

Drop the badge into a README, blog post, or signature. Each render is a backlink to the signed result.

llm-speed: 110 tok/s on M4 Max (40-core GPU) (qwen3-coder)
[![llm-speed: 110 tok/s on M4 Max (40-core GPU) (qwen3-coder)](https://llm-speed.com/badge/r_r0di2hkku1h.svg)](https://llm-speed.com/r/r_r0di2hkku1h)

Related benchmarks

Provenance

Run ID
r_r0di2hkku1h
Fingerprint hash
275ecb2b79296aab
Public key
ya0OvDcfH4La0mEhEhM8iESwvi9/MZz9uibPMNfpovE=
Received
2026-06-27 15:49:16