smoke-host — LLM benchmarks

Name: smoke-host — community LLM benchmarks
Creator: llm-speed
License: https://www.apache.org/licenses/LICENSE-2.0
Keywords: smoke-host, LLM benchmark, tokens per second, decode tok/s, prefill, TTFT

1 workload result across 1 model.

Fastest known config on smoke-host

10.0 decode tok/s

Qwen3-32B-Instruct.Q4_K_M via llama.cpp (Q4_K_M) — see full run

Qwen3-32B-Instruct.Q4_K_M

Workload	Backend	Quant	decode tok/s	prefill tok/s	TTFT	Run
chat-short	llama.cpp@b1	Q4_K_M	10.00tok/s	100.0tok/s	50.0ms	r_r7fc52oxuvq

Models measured on smoke-host

Qwen3-32B-Instruct.Q4_K_M benchmarks

Common questions about smoke-host

Direct Q&A drawn from the runs above: fastest LLM, supported model classes, backend rankings, quantization guidance.

Read the smoke-host FAQ →