Skip to content
llm-speed

Qwen3-Next-80B-A3B-Instruct-MLX-4bit on M3 Ultra (60-core GPU) + 96GB unified

M3 Ultra (60-core GPU) + 96GB unifiedM3 Ultra (60-core GPU) + 96GB unified
suite suite-v1
cli 0.0.1-dev
signedG8xb3zMu3+…
Embed badgesubmitted May 12, 2026

Workload results

WorkloadBackendModeldecode tok/sprefill tok/sTTFTp50p95
chat-shortmlx@0.31.3lmstudio-community-Qwen3-Next-80B-A3B-Instruct-MLX-4bit4bit80.34tok/s24.49tok/s4,493ms12.4ms12.6ms
chat-longmlx@0.31.3lmstudio-community-Qwen3-Next-80B-A3B-Instruct-MLX-4bit4bit77.63tok/s1,608.6tok/s1,956ms12.9ms13.1ms
concurrent-decodemlx@0.31.3lmstudio-community-Qwen3-Next-80B-A3B-Instruct-MLX-4bit4bit78.60tok/s12.7ms12.9ms
agent-tracemlx@0.31.3lmstudio-community-Qwen3-Next-80B-A3B-Instruct-MLX-4bit4bit78.41tok/s1,586.3tok/s1,318ms12.7ms12.9ms

Reproduce on your machine

Same workload, same model, signed at your rig. The exact command that produced this run:

$ pipx install llm-speed && llm-speed bench --model qwen3-next-80b-a3b-instruct --workload 'chat-short'

Runs in about a minute. Your number lands on the leaderboard signed and linkable. How it's measured.

Embed this run

Drop the badge into a README, blog post, or signature. Each render is a backlink to the signed result.

llm-speed: 80.3 tok/s on M3 Ultra (60-core GPU) (Qwen3-Next-80B-A3B-Instruct-MLX…)
[![llm-speed: 80.3 tok/s on M3 Ultra (60-core GPU) (Qwen3-Next-80B-A3B-Instruct-MLX…)](https://llm-speed.com/badge/r_1pl79r50ofy.svg)](https://llm-speed.com/r/r_1pl79r50ofy)

Related benchmarks

Provenance

Run ID
r_1pl79r50ofy
Fingerprint hash
bbf15132ccbbe7d7
Public key
G8xb3zMu3+pznEici/TiW0gPk5qSNIYIikGCwm1rMdQ=
Received
2026-05-12 21:30:26