smoke-host — LLM benchmarks
1 workload result across 1 model.
Fastest known config on smoke-host
10.0 decode tok/s
Qwen3-32B-Instruct.Q4_K_M via llama.cpp (Q4_K_M) — see full run
Qwen3-32B-Instruct.Q4_K_M
| Workload | Backend | Quant | decode tok/s | prefill tok/s | TTFT | Run |
|---|---|---|---|---|---|---|
| chat-short | llama.cpp@b1 | Q4_K_M | 10.00tok/s | 100.0tok/s | 50.0ms | r_r7fc52oxuvq |
Models measured on smoke-host
Common questions about smoke-host
Direct Q&A drawn from the runs above: fastest LLM, supported model classes, backend rankings, quantization guidance.