Best Mac for running local LLMs
M-series chips trade off bandwidth, GPU cores, and unified memory ceiling. Here's the data, ranked by decode tok/s on Qwen-class models.
Recommendation
Fastest M-series result on the leaderboard: M3 Pro (18-core GPU) at 286.5tok/s running mlx-community-Qwen2.5-0.5B-Instruct-4bit.
Apple Silicon is unusually good at LLM inference because the unified memory architecture sidesteps the VRAM ceiling that limits NVIDIA consumer GPUs. The trade-off is bandwidth: an M3 Pro tops out around 150 GB/s while an M3 Ultra does ~800 GB/s, and decode speed scales roughly with bandwidth. Below is every M-series result we have data for, ranked by best decode tok/s.
Submitted benchmarks
| Hardware | Best model | decode tok/s | Run |
|---|---|---|---|
| M3 Pro (18-core GPU) | mlx-community-Qwen2.5-0.5B-Instruct-4bit | 286.5tok/s | r_akcbpx5vcqa |
Side-by-side comparisons
See also: All hardware · All models · Methodology