Leaderboard/buying guide

LLM hardware buying guide

Pick a guide that matches what you actually want to do. Every recommendation is anchored to a real submitted run on the llm-speed leaderboard — no affiliate fluff, just numbers you can click through.

Best hardware for a local coding agent
Pick a rig that runs Qwen3-Coder-Next, Qwen2.5-Coder-32B, gpt-oss, and DeepSeek as a daily-driver coding agent without you waiting on it.
Best rig for Qwen3-Coder-Next
Qwen3-Coder-Next is an 80B-parameter MoE with ~3B active. The activation pattern means it punches above its weight on Apple Silicon and sane consumer GPUs. Here's the data we have.
Local vs hosted: when does buying a GPU pay off?
At low usage, hosted APIs win on $/Mtok. At high sustained usage, a 4090 or M3 Ultra wins. Here's the break-even math, run against live numbers.
Best Mac for running local LLMs
M-series chips trade off bandwidth, GPU cores, and unified memory ceiling. Here's the data, ranked by decode tok/s on Qwen-class models.
Best GPU for local LLMs under $2,000
RTX 4090, RTX 5080, used 3090, RX 7900 XTX, Arc B580. Here's where each lands on real workloads.
Cheapest rig that runs a 70B model comfortably
Llama-3.3-70B and Qwen2.5-72B at 4-bit need ~40 GB of memory. Here's the minimum-spec hardware that holds the model and still serves usable decode tok/s.

Best hardware for a local coding agent

Best rig for Qwen3-Coder-Next

Local vs hosted: when does buying a GPU pay off?

Best Mac for running local LLMs

Best GPU for local LLMs under $2,000

Cheapest rig that runs a 70B model comfortably