Skip to content
llm-speed

qwen3-coder on M4 Max (40-core GPU) + 128GB unified

M4 Max (40-core GPU) + 128GB unifiedM4 Max (40-core GPU) + 128GB unified
suite suite-v1
cli 0.0.3
signedya0OvDcfH4…
Embed badgesubmitted Jun 27, 2026

Workload results

WorkloadBackendModeldecode tok/sprefill tok/sTTFTp50p95
chat-shortollama@0.30.11qwen3-coderQ4_K_M92.70tok/s719.0tok/s153ms8.9ms17.9ms
chat-longollama@0.30.11qwen3-coderQ4_K_M94.61tok/s1,396.5tok/s2,253ms10.3ms10.9ms
concurrent-decodeollama@0.30.11qwen3-coderQ4_K_M109.1tok/s9.2ms9.5ms
agent-trace
error
ollama@0.30.11qwen3-coderQ4_K_M

Reproduce on your machine

Same workload, same model, signed at your rig. The exact command that produced this run:

$ pipx install llm-speed && llm-speed bench --model 'qwen3-coder' --workload 'concurrent-decode'

Runs in about a minute. Your number lands on the leaderboard signed and linkable. How it's measured.

Embed this run

Drop the badge into a README, blog post, or signature. Each render is a backlink to the signed result.

llm-speed: 109 tok/s on M4 Max (40-core GPU) (qwen3-coder)
[![llm-speed: 109 tok/s on M4 Max (40-core GPU) (qwen3-coder)](https://llm-speed.com/badge/r_40h1dznuznk.svg)](https://llm-speed.com/r/r_40h1dznuznk)

Related benchmarks

Provenance

Run ID
r_40h1dznuznk
Fingerprint hash
275ecb2b79296aab
Public key
ya0OvDcfH4La0mEhEhM8iESwvi9/MZz9uibPMNfpovE=
Received
2026-06-27 15:35:59