Coder-V2-Lite-Instruct
5 workload results across 1 hardware configuration.
Fastest local config
309.5 decode tok/s
on RTX 5090 (32GB) + AMD Ryzen 7 9850X3D 8-Core Processor (8c) + 30GB via llama.cpp — see full run
Local runs (5 runs)
Runs from contributors' own machines via MLX, llama.cpp, vLLM, exllamav2, or ollama. Signed on the submitter's hardware.
RTX 5090 (32GB) + AMD Ryzen 7 9850X3D 8-Core Processor (8c) + 30GB
| Workload | Backend | Quant | decode tok/s | prefill tok/s | TTFT | Run |
|---|---|---|---|---|---|---|
| chat-short | llama.cpp | — | 293.1tok/s | no data | 297ms | r_bfpto9so2o1 |
| chat-long | llama.cpp | — | 181.2tok/s | no data | 361ms | r_bfpto9so2o1 |
| concurrent-decode | llama.cpp | — | 268.4tok/s | no data | no data | r_bfpto9so2o1 |
| agent-trace | llama.cpp | — | 204.5tok/s | 24,978.7tok/s | 80.7ms | r_bfpto9so2o1 |
| chat-short | llama.cpp | — | 309.5tok/s | no data | 268ms | r_0_gs1rgl2fl |