Skip to content
llm-speed
Leaderboard/model/qwen3-6-27b-q4-k-m-gguf

Qwen3.6-27B-Q4_K_M.gguf

2 workload results across 1 hardware configuration.

Fastest known config

66.3 decode tok/s

on RTX 5090 (32GB) + AMD Ryzen 7 9850X3D 8-Core Processor (8c) + 30GB via llama.cpp see full run

RTX 5090 (32GB) + AMD Ryzen 7 9850X3D 8-Core Processor (8c) + 30GBRTX 5090 (32GB) + AMD Ryzen 7 9850X3D 8-Core Processor (8c) + 30GB

WorkloadBackendQuantdecode tok/sprefill tok/sTTFTRun
chat-shortllama.cpp66.28tok/sno data353msr_79bwm4mq_4l
chat-shortllama.cppno datano datano datar_p_fnr1056yl

Qwen3.6-27B-Q4_K_M.gguf on hardware