qwen3-coder-bench-32k
4 workload results across 1 hardware configuration.
Fastest local config
113.3 decode tok/s
on M4 Max (40-core GPU) + 128GB unified via ollama (Q4_K_M) — see full run
Local runs (4 runs)
Runs from contributors' own machines via MLX, llama.cpp, vLLM, exllamav2, or ollama. Signed on the submitter's hardware.
M4 Max (40-core GPU) + 128GB unified
| Workload | Backend | Quant | decode tok/s | prefill tok/s | TTFT | Run |
|---|---|---|---|---|---|---|
| chat-short | ollama@0.30.11 | Q4_K_M | 113.3tok/s | 84.12tok/s | 1,308ms | r_roktphpc--8 |
| chat-long | ollama@0.30.11 | Q4_K_M | 97.40tok/s | 1,342.8tok/s | 2,344ms | r_roktphpc--8 |
| concurrent-decode | ollama@0.30.11 | Q4_K_M | 109.7tok/s | no data | no data | r_roktphpc--8 |
| agent-trace | ollama@0.30.11 | Q4_K_M | 103.2tok/s | 3,371.6tok/s | 477ms | r_roktphpc--8 |