The best GPU for a local coding agent in 2026
Published 2026-07-02
Running a coding agent (Cursor, Continue, Aider) against a local model comes down to one thing: decode speed, the rate it streams code back. We measured the popular local coders across every major GPU. Here is what to buy, grounded in signed runs.
By goal
Best value: RTX 3090 (24 GB, ~$700 used). Qwen2.5-Coder-7B decodes 139 tok/s, and a small-MoE coder like DeepSeek-Coder-V2-16B flies at 189 tok/s. It runs 7 to 14B coders comfortably and beats most hosted APIs. See the 3090 deep-dive.
Faster: RTX 4090. The same 7B coder hits 161 tok/s, about 15 to 30% quicker than the 3090 for roughly twice the price (full comparison).
Fastest: RTX 5090 (32 GB). Tops our charts and adds VRAM for larger coders and longer context; see /hw/rtx-5090.
Portable: Apple Silicon. M-series Macs run coders too, and the Max/Ultra tiers are quick. Compare any Mac against a GPU on the head-to-head pages.
The model matters as much as the GPU
A small mixture-of-experts coder gives big-model quality at small-model speed: DeepSeek-Coder-V2-16B decodes 189 tok/s on a 3090, nearly 3x the dense Qwen2.5-Coder-14B at 69 tok/s. Picking the right model is often a bigger win than a pricier card.
How much speed do you need?
Around 30 tok/s and up feels responsive for interactive coding, and all of these clear that easily for 7 to 14B models. Check your exact model + GPU with the VRAM-fit checker and the speed predictor, or browse the full cheatsheet.
Reproduce it
Every number is one command from reproduction:
$ pipx install llm-speed && llm-speed benchNumbers as of July 2026; the linked runs and the cheatsheet always reflect current data.