Fastest rig for Qwen3-Coder-30B-A3B (local)
Qwen3-Coder-30B-A3B is a 30B mixture-of-experts with ~3B active — it decodes like a small model but codes like a big one. Here's the fastest decode tok/s submitted across every GPU and Apple Silicon tier.
As of today, the fastest submitted Qwen3-Coder-30B-A3B decode is on RTX 5090 (32GB) at 259.9tok/s. Runner-up is M3 Ultra (60-core GPU) at 112.2tok/s, a 132% gap to the leader. The 30B-A3B mixture-of-experts activates only ~3B parameters per token, so decode speed tracks memory bandwidth more than raw size — that's why a consumer GPU and an Apple Ultra can land close together. The table ranks every rig we have a real run for.
Fastest submitted run for Qwen3-Coder-30B-A3B: RTX 5090 (32GB) at 259.9tok/s.
Qwen3-Coder-30B-A3B keeps only ~3B of its 30B parameters active per token, so it fits at 4-bit on a single 24 GB GPU or any Apple Silicon Max/Ultra and decodes far faster than a dense 30B — memory bandwidth, not raw size, sets the speed. That's why a consumer GPU and an Apple Ultra can land surprisingly close. Below is every rig we have a real submitted run for, ranked by decode tok/s, each row linking the signed run. If your config isn't here yet, run the suite and submit — the row appears next refresh.
Submitted benchmarks
| Hardware | decode tok/s | Workload | Run |
|---|---|---|---|
| RTX 5090 (32GB) | 259.9tok/s | chat-short | r_c7qyvvmmsv1 |
| M3 Ultra (60-core GPU) | 112.2tok/s | chat-short | r_fpsca03u2o_ |
See also: All hardware · All models · Methodology