Leaderboard/models/qwen3-coder-bench-32k

qwen3-coder-bench-32k

Name: qwen3-coder-bench-32k — community LLM benchmarks
Creator: llm-speed
License: https://www.apache.org/licenses/LICENSE-2.0
Keywords: qwen3-coder-bench-32k, LLM benchmark, tokens per second, decode tok/s, prefill, TTFT

4 workload results across 1 hardware configuration.

Fastest local config

113.3 decode tok/s

on M4 Max (40-core GPU) + 128GB unified via ollama (Q4_K_M) — see full run

Local runs (4 runs)

Runs from contributors' own machines via MLX, llama.cpp, vLLM, exllamav2, or ollama. Signed on the submitter's hardware.

M4 Max (40-core GPU) + 128GB unified

Workload	Backend	Quant	decode tok/s	prefill tok/s	TTFT	Run
chat-short	ollama@0.30.11	Q4_K_M	113.3tok/s	84.12tok/s	1,308ms	r_roktphpc--8
chat-long	ollama@0.30.11	Q4_K_M	97.40tok/s	1,342.8tok/s	2,344ms	r_roktphpc--8
concurrent-decode	ollama@0.30.11	Q4_K_M	109.7tok/s	no data	no data	r_roktphpc--8
agent-trace	ollama@0.30.11	Q4_K_M	103.2tok/s	3,371.6tok/s	477ms	r_roktphpc--8

qwen3-coder-bench-32k on hardware

M4 Max (40-core GPU) LLM benchmarks

M4 Max (40-core GPU) + 128GB unifiedM4 Max (40-core GPU) + 128GB unified

qwen3-coder-bench-32k on hardware

M4 Max (40-core GPU) + 128GB unified