Skip to content
llm-speed

Best hardware for a local coding agent

Pick a rig that runs Qwen3-Coder-Next, Qwen2.5-Coder-32B, gpt-oss, and DeepSeek as a daily-driver coding agent without you waiting on it.

No data submitted for this task yet.

Run the suite to be the first benchmark for this guide:

$ pipx install llm-speed && llm-speed bench

A local coding agent stresses three things at once: enough VRAM (or unified memory) to hold a 14B-32B coder model at a reasonable quantization, prefill speed for long tool-use prompts, and decode speed for the model's reply. Below are the configurations we have benchmark data for, ranked by decode tok/s on the workloads tagged 'coder'. We list every result so you can make the call yourself, not just the headline number.

Side-by-side comparisons

See also: All hardware · All models · Methodology