Skip to content
llm-speed
Leaderboard/models/qwen3-32b-instruct-q4-k-m

Qwen3-32B-Instruct.Q4_K_M

1 workload result across 1 hardware configuration.

Fastest local config

10.0 decode tok/s

on smoke-host via llama.cpp (Q4_K_M) see full run

Local runs (1 run)

Runs from contributors' own machines via MLX, llama.cpp, vLLM, exllamav2, or ollama. Signed on the submitter's hardware.

smoke-hostsmoke-host

WorkloadBackendQuantdecode tok/sprefill tok/sTTFTRun
chat-shortllama.cpp@b1Q4_K_M10.00tok/s100.0tok/s50.0msr_r7fc52oxuvq

Qwen3-32B-Instruct.Q4_K_M on hardware