Skip to content
llm-speed

qwen3-coder on M4 Max (40-core GPU) + 128GB unified

M4 Max (40-core GPU) + 128GB unifiedM4 Max (40-core GPU) + 128GB unified
suite suite-v1
cli 0.0.3
signedya0OvDcfH4…
Embed badgesubmitted Jun 27, 2026

Workload results

WorkloadBackendModeldecode tok/sprefill tok/sTTFTp50p95
chat-shortollama@0.30.11qwen3-coderQ4_K_M93.17tok/s13.30tok/s8,270ms8.8ms26.5ms

Reproduce on your machine

Same workload, same model, signed at your rig. The exact command that produced this run:

$ pipx install llm-speed && llm-speed bench --model 'qwen3-coder' --workload 'chat-short'

Runs in about a minute. Your number lands on the leaderboard signed and linkable. How it's measured.

Embed this run

Drop the badge into a README, blog post, or signature. Each render is a backlink to the signed result.

llm-speed: 93.2 tok/s on M4 Max (40-core GPU) (qwen3-coder)
[![llm-speed: 93.2 tok/s on M4 Max (40-core GPU) (qwen3-coder)](https://llm-speed.com/badge/r_txvkjgoyyxs.svg)](https://llm-speed.com/r/r_txvkjgoyyxs)

Related benchmarks

Provenance

Run ID
r_txvkjgoyyxs
Fingerprint hash
275ecb2b79296aab
Public key
ya0OvDcfH4La0mEhEhM8iESwvi9/MZz9uibPMNfpovE=
Received
2026-06-27 15:34:09