Leaderboard/models/qwen3-32b-instruct-q4-k-m

Qwen3-32B-Instruct.Q4_K_M

Name: Qwen3-32B-Instruct.Q4_K_M — community LLM benchmarks
Creator: llm-speed
License: https://www.apache.org/licenses/LICENSE-2.0
Keywords: Qwen3-32B-Instruct.Q4_K_M, LLM benchmark, tokens per second, decode tok/s, prefill, TTFT

1 workload result across 1 hardware configuration.

Fastest local config

10.0 decode tok/s

on smoke-host via llama.cpp (Q4_K_M) — see full run

Local runs (1 run)

Runs from contributors' own machines via MLX, llama.cpp, vLLM, exllamav2, or ollama. Signed on the submitter's hardware.

smoke-host

Workload	Backend	Quant	decode tok/s	prefill tok/s	TTFT	Run
chat-short	llama.cpp@b1	Q4_K_M	10.00tok/s	100.0tok/s	50.0ms	r_r7fc52oxuvq

Qwen3-32B-Instruct.Q4_K_M on hardware

smoke-host LLM benchmarks

smoke-hostsmoke-host

Qwen3-32B-Instruct.Q4_K_M on hardware

smoke-host