Skip to content
llm-speed

Best Mac for running local LLMs

M-series chips trade off bandwidth, GPU cores, and unified memory ceiling. Here's the data, ranked by decode tok/s on Qwen-class models.

Recommendation

Fastest M-series result on the leaderboard: M3 Pro (18-core GPU) at 286.5tok/s running mlx-community-Qwen2.5-0.5B-Instruct-4bit.

Apple Silicon is unusually good at LLM inference because the unified memory architecture sidesteps the VRAM ceiling that limits NVIDIA consumer GPUs. The trade-off is bandwidth: an M3 Pro tops out around 150 GB/s while an M3 Ultra does ~800 GB/s, and decode speed scales roughly with bandwidth. Below is every M-series result we have data for, ranked by best decode tok/s.

Submitted benchmarks

Side-by-side comparisons

See also: All hardware · All models · Methodology