qwen3.6
4 workload results across 1 hardware configuration.
Fastest local config
44.2 decode tok/s
on RTX 4090 (48GB) + AMD EPYC 7763 64-Core Processor (128c) + 1008GB via ollama (Q4_K_M) — see full run
Local runs (4 runs)
Runs from contributors' own machines via MLX, llama.cpp, vLLM, exllamav2, or ollama. Signed on the submitter's hardware.
RTX 4090 (48GB) + AMD EPYC 7763 64-Core Processor (128c) + 1008GB
| Workload | Backend | Quant | decode tok/s | prefill tok/s | TTFT | Run |
|---|---|---|---|---|---|---|
| chat-short | ollama@0.31.1 | Q4_K_M | no data | no data | no data | r_h_659oy695r |
| chat-long | ollama@0.31.1 | Q4_K_M | no data | no data | no data | r_h_659oy695r |
| concurrent-decode | ollama@0.31.1 | Q4_K_M | no data | no data | no data | r_h_659oy695r |
| agent-trace | ollama@0.31.1 | Q4_K_M | 44.18tok/s | 1,910.5tok/s | 1,808ms | r_h_659oy695r |