Leaderboard/models/coder-v2-lite-instruct

Coder-V2-Lite-Instruct

Name: Coder-V2-Lite-Instruct — community LLM benchmarks
Creator: llm-speed
License: https://creativecommons.org/licenses/by/4.0/
Keywords: Coder-V2-Lite-Instruct, LLM benchmark, tokens per second, decode tok/s, prefill, TTFT

5 workload results across 1 hardware configuration.

Fastest local config

309.5 decode tok/s

on RTX 5090 (32GB) + AMD Ryzen 7 9850X3D 8-Core Processor (8c) + 30GB via llama.cpp — see full run

Local runs (5 runs)

Runs from contributors' own machines via MLX, llama.cpp, vLLM, exllamav2, or ollama. Signed on the submitter's hardware.

RTX 5090 (32GB) + AMD Ryzen 7 9850X3D 8-Core Processor (8c) + 30GB

Workload	Backend	Quant	decode tok/s	prefill tok/s	TTFT	Run
chat-short	llama.cpp	—	293.1tok/s	no data	297ms	r_bfpto9so2o1
chat-long	llama.cpp	—	181.2tok/s	no data	361ms	r_bfpto9so2o1
concurrent-decode	llama.cpp	—	268.4tok/s	no data	no data	r_bfpto9so2o1
agent-trace	llama.cpp	—	204.5tok/s	24,978.7tok/s	80.7ms	r_bfpto9so2o1
chat-short	llama.cpp	—	309.5tok/s	no data	268ms	r_0_gs1rgl2fl

Coder-V2-Lite-Instruct on hardware

RTX 5090 (32GB) LLM benchmarks

RTX 5090 (32GB) + AMD Ryzen 7 9850X3D 8-Core Processor (8c) + 30GBRTX 5090 (32GB) + AMD Ryzen 7 9850X3D 8-Core Processor (8c) + 30GB

Coder-V2-Lite-Instruct on hardware

RTX 5090 (32GB) + AMD Ryzen 7 9850X3D 8-Core Processor (8c) + 30GB