Skip to content
llm-speed

Fastest rig for Qwen3-Coder-30B-A3B (local)

Qwen3-Coder-30B-A3B is a 30B mixture-of-experts with ~3B active — it decodes like a small model but codes like a big one. Here's the fastest decode tok/s submitted across every GPU and Apple Silicon tier.

Verdict

As of today, the fastest submitted Qwen3-Coder-30B-A3B decode is on RTX 5090 (32GB) at 259.9tok/s. Runner-up is M3 Ultra (60-core GPU) at 112.2tok/s, a 132% gap to the leader. The 30B-A3B mixture-of-experts activates only ~3B parameters per token, so decode speed tracks memory bandwidth more than raw size — that's why a consumer GPU and an Apple Ultra can land close together. The table ranks every rig we have a real run for.

Recommendation

Fastest submitted run for Qwen3-Coder-30B-A3B: RTX 5090 (32GB) at 259.9tok/s.

Qwen3-Coder-30B-A3B keeps only ~3B of its 30B parameters active per token, so it fits at 4-bit on a single 24 GB GPU or any Apple Silicon Max/Ultra and decodes far faster than a dense 30B — memory bandwidth, not raw size, sets the speed. That's why a consumer GPU and an Apple Ultra can land surprisingly close. Below is every rig we have a real submitted run for, ranked by decode tok/s, each row linking the signed run. If your config isn't here yet, run the suite and submit — the row appears next refresh.

Submitted benchmarks

Hardwaredecode tok/sWorkloadRun
RTX 5090 (32GB)259.9tok/schat-shortr_c7qyvvmmsv1
M3 Ultra (60-core GPU)112.2tok/schat-shortr_fpsca03u2o_

See also: All hardware · All models · Methodology