Skip to content
llm-speed
Leaderboard/models/qwen3-coder-bench-32k

qwen3-coder-bench-32k

4 workload results across 1 hardware configuration.

Fastest local config

113.3 decode tok/s

on M4 Max (40-core GPU) + 128GB unified via ollama (Q4_K_M) see full run

Local runs (4 runs)

Runs from contributors' own machines via MLX, llama.cpp, vLLM, exllamav2, or ollama. Signed on the submitter's hardware.

M4 Max (40-core GPU) + 128GB unifiedM4 Max (40-core GPU) + 128GB unified

WorkloadBackendQuantdecode tok/sprefill tok/sTTFTRun
chat-shortollama@0.30.11Q4_K_M113.3tok/s84.12tok/s1,308msr_roktphpc--8
chat-longollama@0.30.11Q4_K_M97.40tok/s1,342.8tok/s2,344msr_roktphpc--8
concurrent-decodeollama@0.30.11Q4_K_M109.7tok/sno datano datar_roktphpc--8
agent-traceollama@0.30.11Q4_K_M103.2tok/s3,371.6tok/s477msr_roktphpc--8

qwen3-coder-bench-32k on hardware