Skip to content
llm-speed

coder-v2 on RTX 3090 (24GB) + AMD EPYC 7702P 64-Core Processor (64c) + 252GB

Workload results

WorkloadBackendModeldecode tok/sprefill tok/sTTFTp50p95
chat-shortollama@0.31.1deepseek-coder-v2Q4_0189.5tok/s159.2tok/s735ms5.2ms5.6ms
chat-longollama@0.31.1deepseek-coder-v2Q4_077.56tok/s3,907.5tok/s838ms10.7ms11.5ms
concurrent-decodeollama@0.31.1deepseek-coder-v2Q4_0155.6tok/s6.3ms6.7ms
agent-traceollama@0.31.1deepseek-coder-v2Q4_0101.2tok/s6,933.1tok/s352ms11.3ms12.2ms

Reproduce on your machine

Same workload, same model, signed at your rig. The exact command that produced this run:

$ pipx install llm-speed && llm-speed bench --model 'deepseek-coder-v2' --workload 'chat-short'

Runs in about a minute. Your number lands on the leaderboard signed and linkable. How it's measured.

Embed this run

Drop the badge into a README, blog post, or signature. Each render is a backlink to the signed result.

llm-speed: 189 tok/s on RTX 3090 (24GB) (coder-v2)
[![llm-speed: 189 tok/s on RTX 3090 (24GB) (coder-v2)](https://llm-speed.com/badge/r_o2-1w665rtq.svg)](https://llm-speed.com/r/r_o2-1w665rtq)

Related benchmarks

Provenance

Run ID
r_o2-1w665rtq
Fingerprint hash
Public key
rJBt+3KzrzeTAA+1BzYrp0h4ONLfSDQFr+RTyzH9Zk0=
Received
2026-07-02 05:14:55