qwen3.6

Name: qwen3.6 — community LLM benchmarks
Creator: llm-speed
License: https://creativecommons.org/licenses/by/4.0/
Keywords: qwen3.6, LLM benchmark, tokens per second, decode tok/s, prefill, TTFT

4 workload results across 1 hardware configuration.

Fastest local config

44.2 decode tok/s

on RTX 4090 (48GB) + AMD EPYC 7763 64-Core Processor (128c) + 1008GB via ollama (Q4_K_M) — see full run

Local runs (4 runs)

Runs from contributors' own machines via MLX, llama.cpp, vLLM, exllamav2, or ollama. Signed on the submitter's hardware.

RTX 4090 (48GB) + AMD EPYC 7763 64-Core Processor (128c) + 1008GB

Workload	Backend	Quant	decode tok/s	prefill tok/s	TTFT	Run
chat-short	ollama@0.31.1	Q4_K_M	no data	no data	no data	r_h_659oy695r
chat-long	ollama@0.31.1	Q4_K_M	no data	no data	no data	r_h_659oy695r
concurrent-decode	ollama@0.31.1	Q4_K_M	no data	no data	no data	r_h_659oy695r
agent-trace	ollama@0.31.1	Q4_K_M	44.18tok/s	1,910.5tok/s	1,808ms	r_h_659oy695r

qwen3.6 on hardware

RTX 4090 (48GB) LLM benchmarks

RTX 4090 (48GB) + AMD EPYC 7763 64-Core Processor (128c) + 1008GBRTX 4090 (48GB) + AMD EPYC 7763 64-Core Processor (128c) + 1008GB

qwen3.6 on hardware

RTX 4090 (48GB) + AMD EPYC 7763 64-Core Processor (128c) + 1008GB