Qwen3.6-27B-Q4_K_M.gguf
2 workload results across 1 hardware configuration.
Fastest known config
66.3 decode tok/s
on RTX 5090 (32GB) + AMD Ryzen 7 9850X3D 8-Core Processor (8c) + 30GB via llama.cpp — see full run
RTX 5090 (32GB) + AMD Ryzen 7 9850X3D 8-Core Processor (8c) + 30GB
| Workload | Backend | Quant | decode tok/s | prefill tok/s | TTFT | Run |
|---|---|---|---|---|---|---|
| chat-short | llama.cpp | — | 66.28tok/s | no data | 353ms | r_79bwm4mq_4l |
| chat-short | llama.cpp | — | no data | no data | no data | r_p_fnr1056yl |