Fastest small-MoE coder on a Mac (Qwen3-Coder-30B, DeepSeek-Coder-V2-Lite)
Two MoE coders — Qwen3-Coder-30B-A3B and DeepSeek-Coder-V2-Lite — both clear 100 tok/s decode on an M3 Ultra at 4-bit. Here's how they rank across every Apple Silicon tier we have data for.
No M-series submissions yet for Qwen3-Coder-30B-A3B or DeepSeek-Coder-V2-Lite. Both fit at 4-bit MLX on a 64 GB+ M-series Mac and should clear 80–120 tok/s on an Ultra; submit a run and this guide will rank concretely against your numbers.
No data submitted for this task yet.
Run the suite to be the first benchmark for this guide:
$ pipx install llm-speed && llm-speed bench
MoE coders with ~3 B active parameters are the sweet spot for Apple Silicon: the activation pattern means the bandwidth bottleneck only hits the active subset, so an M-series Ultra punches well above its raw memory-bandwidth number. Qwen3-Coder-30B-A3B and DeepSeek-Coder-V2-Lite (16B-A2.4B) both fit comfortably at 4-bit MLX on a 64 GB+ unified-memory Mac and decode fast enough for a daily-driver coding agent. Below is every submitted run we have for either model, ranked by decode tok/s, with the Mac tier (Pro / Max / Ultra) called out so the price-to-tok/s line is legible. If your config isn't there yet, run the suite and submit — that row will appear next refresh.
Side-by-side comparisons
See also: All hardware · All models · Methodology