Skip to content
llm-speed
Leaderboard/models/lmstudio-community-qwen3-next-80b-a3b-instruct-mlx-4bit

Qwen3-Next-80B-A3B-Instruct-MLX-4bit

4 workload results across 1 hardware configuration.

Fastest local config

80.3 decode tok/s

on M3 Ultra (60-core GPU) + 96GB unified via mlx (4bit) see full run

Local runs (4 runs)

Runs from contributors' own machines via MLX, llama.cpp, vLLM, exllamav2, or ollama. Signed on the submitter's hardware.

M3 Ultra (60-core GPU) + 96GB unifiedM3 Ultra (60-core GPU) + 96GB unified

WorkloadBackendQuantdecode tok/sprefill tok/sTTFTRun
chat-shortmlx@0.31.34bit80.34tok/s24.49tok/s4,493msr_1pl79r50ofy
chat-longmlx@0.31.34bit77.63tok/s1,608.6tok/s1,956msr_1pl79r50ofy
concurrent-decodemlx@0.31.34bit78.60tok/sno datano datar_1pl79r50ofy
agent-tracemlx@0.31.34bit78.41tok/s1,586.3tok/s1,318msr_1pl79r50ofy

Qwen3-Next-80B-A3B-Instruct-MLX-4bit on hardware