coder-v2

Name: coder-v2 — community LLM benchmarks
Creator: llm-speed
License: https://creativecommons.org/licenses/by/4.0/
Keywords: coder-v2, LLM benchmark, tokens per second, decode tok/s, prefill, TTFT

4 workload results across 1 hardware configuration.

Fastest local config

189.5 decode tok/s

on RTX 3090 (24GB) + AMD EPYC 7702P 64-Core Processor (64c) + 252GB via ollama (Q4_0) — see full run

Local runs (4 runs)

Runs from contributors' own machines via MLX, llama.cpp, vLLM, exllamav2, or ollama. Signed on the submitter's hardware.

RTX 3090 (24GB) + AMD EPYC 7702P 64-Core Processor (64c) + 252GB

Workload	Backend	Quant	decode tok/s	prefill tok/s	TTFT	Run
chat-short	ollama@0.31.1	Q4_0	189.5tok/s	159.2tok/s	735ms	r_o2-1w665rtq
chat-long	ollama@0.31.1	Q4_0	77.56tok/s	3,907.5tok/s	838ms	r_o2-1w665rtq
concurrent-decode	ollama@0.31.1	Q4_0	155.6tok/s	no data	no data	r_o2-1w665rtq
agent-trace	ollama@0.31.1	Q4_0	101.2tok/s	6,933.1tok/s	352ms	r_o2-1w665rtq

coder-v2 on hardware

RTX 3090 (24GB) LLM benchmarks

RTX 3090 (24GB) + AMD EPYC 7702P 64-Core Processor (64c) + 252GBRTX 3090 (24GB) + AMD EPYC 7702P 64-Core Processor (64c) + 252GB

coder-v2 on hardware

RTX 3090 (24GB) + AMD EPYC 7702P 64-Core Processor (64c) + 252GB