Best Mac for running local LLMs

M-series chips trade off bandwidth, GPU cores, and unified memory ceiling. Here's the data, ranked by decode tok/s on Qwen-class models.

Recommendation

Fastest M-series result on the leaderboard: M3 Pro (18-core GPU) at 286.5tok/s running mlx-community-Qwen2.5-0.5B-Instruct-4bit.

Apple Silicon is unusually good at LLM inference because the unified memory architecture sidesteps the VRAM ceiling that limits NVIDIA consumer GPUs. The trade-off is bandwidth: an M3 Pro tops out around 150 GB/s while an M3 Ultra does ~800 GB/s, and decode speed scales roughly with bandwidth. Below is every M-series result we have data for, ranked by best decode tok/s.

Submitted benchmarks

Hardware	Best model	decode tok/s	Run
M3 Pro (18-core GPU)	mlx-community-Qwen2.5-0.5B-Instruct-4bit	286.5tok/s	r_akcbpx5vcqa

Side-by-side comparisons

M3 Pro vs M3 Max →
M3 Max vs M3 Ultra →
M2 Ultra vs M3 Ultra →
M4 Max vs M3 Ultra →

See also: All hardware · All models · Methodology