Skip to content
llm-speed

coder-v2

4 workload results across 1 hardware configuration.

Fastest local config

189.5 decode tok/s

on RTX 3090 (24GB) + AMD EPYC 7702P 64-Core Processor (64c) + 252GB via ollama (Q4_0) see full run

Local runs (4 runs)

Runs from contributors' own machines via MLX, llama.cpp, vLLM, exllamav2, or ollama. Signed on the submitter's hardware.

RTX 3090 (24GB) + AMD EPYC 7702P 64-Core Processor (64c) + 252GBRTX 3090 (24GB) + AMD EPYC 7702P 64-Core Processor (64c) + 252GB

WorkloadBackendQuantdecode tok/sprefill tok/sTTFTRun
chat-shortollama@0.31.1Q4_0189.5tok/s159.2tok/s735msr_o2-1w665rtq
chat-longollama@0.31.1Q4_077.56tok/s3,907.5tok/s838msr_o2-1w665rtq
concurrent-decodeollama@0.31.1Q4_0155.6tok/sno datano datar_o2-1w665rtq
agent-traceollama@0.31.1Q4_0101.2tok/s6,933.1tok/s352msr_o2-1w665rtq

coder-v2 on hardware