Skip to content
llm-speed

smoke-host — LLM benchmarks

1 workload result across 1 model.

Fastest known config on smoke-host

10.0 decode tok/s

Qwen3-32B-Instruct.Q4_K_M via llama.cpp (Q4_K_M) see full run

Qwen3-32B-Instruct.Q4_K_M

WorkloadBackendQuantdecode tok/sprefill tok/sTTFTRun
chat-shortllama.cpp@b1Q4_K_M10.00tok/s100.0tok/s50.0msr_r7fc52oxuvq

Models measured on smoke-host

Common questions about smoke-host

Direct Q&A drawn from the runs above: fastest LLM, supported model classes, backend rankings, quantization guidance.

Read the smoke-host FAQ →