Leaderboard/models/mlx-community-deepseek-coder-v2-lite-instruct-4bit

Coder-V2-Lite-Instruct-4bit

Name: Coder-V2-Lite-Instruct-4bit: community LLM benchmarks
Creator: llm-speed
License: https://creativecommons.org/licenses/by/4.0/
Keywords: Coder-V2-Lite-Instruct-4bit, LLM benchmark, tokens per second, decode tok/s, prefill, TTFT

1 workload result across 1 hardware configuration.

Fastest local config

168.3 decode tok/s

on M3 Ultra (60-core GPU) + 96GB unified via mlx. see full run

Local runs (1 run)

Runs from contributors' own machines via MLX, llama.cpp, vLLM, exllamav2, or ollama. Signed on the submitter's hardware.

M3 Ultra (60-core GPU) + 96GB unified

Workload	Backend	Quant	decode tok/s	prefill tok/s	TTFT	Run
chat-short	mlx@0.31.3	-	168.3tok/s	291.5tok/s	449ms	r_l_v1-zq_qaz

Coder-V2-Lite-Instruct-4bit on hardware

M3 Ultra (60-core GPU) LLM benchmarks

M3 Ultra (60-core GPU) + 96GB unifiedM3 Ultra (60-core GPU) + 96GB unified

Coder-V2-Lite-Instruct-4bit on hardware

M3 Ultra (60-core GPU) + 96GB unified