coder-v2
4 workload results across 1 hardware configuration.
Fastest local config
189.5 decode tok/s
on RTX 3090 (24GB) + AMD EPYC 7702P 64-Core Processor (64c) + 252GB via ollama (Q4_0) — see full run
Local runs (4 runs)
Runs from contributors' own machines via MLX, llama.cpp, vLLM, exllamav2, or ollama. Signed on the submitter's hardware.
RTX 3090 (24GB) + AMD EPYC 7702P 64-Core Processor (64c) + 252GB
| Workload | Backend | Quant | decode tok/s | prefill tok/s | TTFT | Run |
|---|---|---|---|---|---|---|
| chat-short | ollama@0.31.1 | Q4_0 | 189.5tok/s | 159.2tok/s | 735ms | r_o2-1w665rtq |
| chat-long | ollama@0.31.1 | Q4_0 | 77.56tok/s | 3,907.5tok/s | 838ms | r_o2-1w665rtq |
| concurrent-decode | ollama@0.31.1 | Q4_0 | 155.6tok/s | no data | no data | r_o2-1w665rtq |
| agent-trace | ollama@0.31.1 | Q4_0 | 101.2tok/s | 6,933.1tok/s | 352ms | r_o2-1w665rtq |