r1
16 workload results across 1 hardware configuration.
Fastest local config
133.8 decode tok/s
on RTX 4090 (24GB) + AMD EPYC 75F3 32-Core Processor (64c) + 504GB via ollama (Q4_K_M) — see full run
Local runs (16 runs)
Runs from contributors' own machines via MLX, llama.cpp, vLLM, exllamav2, or ollama. Signed on the submitter's hardware.
RTX 4090 (24GB) + AMD EPYC 75F3 32-Core Processor (64c) + 504GB
| Workload | Backend | Quant | decode tok/s | prefill tok/s | TTFT | Run |
|---|---|---|---|---|---|---|
| chat-short | ollama@0.31.1 | Q4_K_M | no data | no data | no data | r_zrjj-pj93q8 |
| chat-long | ollama@0.31.1 | Q4_K_M | 81.09tok/s | 240.4tok/s | 13,070ms | r_zrjj-pj93q8 |
| concurrent-decode | ollama@0.31.1 | Q4_K_M | no data | no data | no data | r_zrjj-pj93q8 |
| agent-trace | ollama@0.31.1 | Q4_K_M | no data | no data | no data | r_zrjj-pj93q8 |
| chat-short | ollama@0.31.1 | Q4_K_M | no data | no data | no data | r_fg77v2hhohb |
| chat-long | ollama@0.31.1 | Q4_K_M | 131.9tok/s | 686.3tok/s | 4,578ms | r_fg77v2hhohb |
| concurrent-decode | ollama@0.31.1 | Q4_K_M | no data | no data | no data | r_fg77v2hhohb |
| agent-trace | ollama@0.31.1 | Q4_K_M | 133.8tok/s | 8,852.4tok/s | 366ms | r_fg77v2hhohb |
| chat-short | ollama@0.31.1 | Q4_K_M | no data | no data | no data | r_l2ck67ls40i |
| chat-long | ollama@0.31.1 | Q4_K_M | 3.76tok/s | 14.28tok/s | 219,986ms | r_l2ck67ls40i |
| concurrent-decode | ollama@0.31.1 | Q4_K_M | no data | no data | no data | r_l2ck67ls40i |
| agent-trace | ollama@0.31.1 | Q4_K_M | 3.79tok/s | 3,515.1tok/s | 971ms | r_l2ck67ls40i |
| chat-short | ollama@0.31.1 | Q4_K_M | no data | no data | no data | r_2r1p3w9ps7c |
| chat-long | ollama@0.31.1 | Q4_K_M | 131.0tok/s | 599.9tok/s | 5,238ms | r_2r1p3w9ps7c |
| concurrent-decode | ollama@0.31.1 | Q4_K_M | no data | no data | no data | r_2r1p3w9ps7c |
| agent-trace | ollama@0.31.1 | Q4_K_M | 132.4tok/s | 9,612.8tok/s | 337ms | r_2r1p3w9ps7c |