Skip to content
llm-speed
Leaderboard/models/qwen2-5-coder

qwen2.5-coder

16 workload results across 3 hardware configurations.

Fastest local config

161.1 decode tok/s

on RTX 4090 (48GB) + AMD EPYC 7763 64-Core Processor (128c) + 1008GB via ollama (Q4_K_M) see full run

Local runs (16 runs)

Runs from contributors' own machines via MLX, llama.cpp, vLLM, exllamav2, or ollama. Signed on the submitter's hardware.

RTX 4090 (48GB) + AMD EPYC 7763 64-Core Processor (128c) + 1008GBRTX 4090 (48GB) + AMD EPYC 7763 64-Core Processor (128c) + 1008GB

WorkloadBackendQuantdecode tok/sprefill tok/sTTFTRun
chat-shortollama@0.31.1Q4_K_M89.45tok/s466.6tok/s281msr_73tnfdueq2h
chat-longollama@0.31.1Q4_K_M83.84tok/s3,572.3tok/s887msr_73tnfdueq2h
concurrent-decodeollama@0.31.1Q4_K_M88.63tok/sno datano datar_73tnfdueq2h
agent-traceollama@0.31.1Q4_K_M86.21tok/s5,298.4tok/s395msr_73tnfdueq2h
chat-shortollama@0.31.1Q4_K_M159.4tok/s452.0tok/s290msr_mv8n8k9wu1e
chat-longollama@0.31.1Q4_K_M154.7tok/s5,998.7tok/s528msr_mv8n8k9wu1e
concurrent-decodeollama@0.31.1Q4_K_M161.1tok/sno datano datar_mv8n8k9wu1e
agent-traceollama@0.31.1Q4_K_M158.1tok/s6,421.1tok/s320msr_mv8n8k9wu1e

RTX 3090 (24GB) + AMD EPYC 7702P 64-Core Processor (64c) + 252GBRTX 3090 (24GB) + AMD EPYC 7702P 64-Core Processor (64c) + 252GB

WorkloadBackendQuantdecode tok/sprefill tok/sTTFTRun
chat-shortollama@0.31.1Q4_K_M69.21tok/s402.0tok/s326msr_6ahy-dq0f_0
chat-longollama@0.31.1Q4_K_M65.83tok/s2,070.6tok/s1,530msr_6ahy-dq0f_0
concurrent-decodeollama@0.31.1Q4_K_M67.64tok/sno datano datar_6ahy-dq0f_0
agent-traceollama@0.31.1Q4_K_M63.63tok/s3,518.8tok/s483msr_6ahy-dq0f_0

RTX 3090 (24GB) + AMD EPYC 7663 56-Core Processor (56c) + 252GBRTX 3090 (24GB) + AMD EPYC 7663 56-Core Processor (56c) + 252GB

WorkloadBackendQuantdecode tok/sprefill tok/sTTFTRun
chat-shortollama@0.31.1Q4_K_M39.86tok/s2.83tok/s46,337msr_pb0jpnbujji
chat-longollama@0.31.1Q4_K_M135.0tok/s3,319.4tok/s954msr_pb0jpnbujji
concurrent-decodeollama@0.31.1Q4_K_M139.2tok/sno datano datar_pb0jpnbujji
agent-traceollama@0.31.1Q4_K_M134.2tok/s4,111.1tok/s487msr_pb0jpnbujji

qwen2.5-coder on hardware