Skip to content
llm-speed
Leaderboard/models/coder-v2-lite-instruct

Coder-V2-Lite-Instruct

5 workload results across 1 hardware configuration.

Fastest local config

309.5 decode tok/s

on RTX 5090 (32GB) + AMD Ryzen 7 9850X3D 8-Core Processor (8c) + 30GB via llama.cpp see full run

Local runs (5 runs)

Runs from contributors' own machines via MLX, llama.cpp, vLLM, exllamav2, or ollama. Signed on the submitter's hardware.

RTX 5090 (32GB) + AMD Ryzen 7 9850X3D 8-Core Processor (8c) + 30GBRTX 5090 (32GB) + AMD Ryzen 7 9850X3D 8-Core Processor (8c) + 30GB

WorkloadBackendQuantdecode tok/sprefill tok/sTTFTRun
chat-shortllama.cpp293.1tok/sno data297msr_bfpto9so2o1
chat-longllama.cpp181.2tok/sno data361msr_bfpto9so2o1
concurrent-decodellama.cpp268.4tok/sno datano datar_bfpto9so2o1
agent-tracellama.cpp204.5tok/s24,978.7tok/s80.7msr_bfpto9so2o1
chat-shortllama.cpp309.5tok/sno data268msr_0_gs1rgl2fl

Coder-V2-Lite-Instruct on hardware