Local vs hosted: when does buying a GPU pay off?

At low usage, hosted APIs win on $/Mtok. At high sustained usage, a 4090 or M3 Ultra wins. Here's the break-even math, run against live numbers.

Recommendation

Reference local rig on the leaderboard: M3 Pro (18-core GPU) at 286.5tok/s.

We don't sell hardware and we don't take affiliate commissions on hosted APIs, so the framing is just arithmetic. A consumer GPU's break-even point against a hosted endpoint depends on three things: your sustained decode tok/s, the hosted price per million output tokens, and your duty cycle. Below is a comparison table with each row anchored to a real submitted benchmark.

Submitted benchmarks

Hardware	Model	decode tok/s	Run
M3 Pro (18-core GPU)	mlx-community-Qwen2.5-0.5B-Instruct-4bit	286.5tok/s	r_akcbpx5vcqa

Side-by-side comparisons

M3 Ultra vs Groq Llama-3.3-70B →
RTX 4090 vs Together Llama-3.3-70B →

See also: All hardware · All models · Methodology