Skip to content
llm-speed

qwen3.6

4 workload results across 1 hardware configuration.

Fastest local config

44.2 decode tok/s

on RTX 4090 (48GB) + AMD EPYC 7763 64-Core Processor (128c) + 1008GB via ollama (Q4_K_M) see full run

Local runs (4 runs)

Runs from contributors' own machines via MLX, llama.cpp, vLLM, exllamav2, or ollama. Signed on the submitter's hardware.

RTX 4090 (48GB) + AMD EPYC 7763 64-Core Processor (128c) + 1008GBRTX 4090 (48GB) + AMD EPYC 7763 64-Core Processor (128c) + 1008GB

WorkloadBackendQuantdecode tok/sprefill tok/sTTFTRun
chat-shortollama@0.31.1Q4_K_Mno datano datano datar_h_659oy695r
chat-longollama@0.31.1Q4_K_Mno datano datano datar_h_659oy695r
concurrent-decodeollama@0.31.1Q4_K_Mno datano datano datar_h_659oy695r
agent-traceollama@0.31.1Q4_K_M44.18tok/s1,910.5tok/s1,808msr_h_659oy695r

qwen3.6 on hardware