Skip to content
llm-speed
Leaderboard/model/gpt-oss-20b-mxfp4-q4

gpt-oss-20b-MXFP4-Q4

3 workload results across 2 hardware configurations.

Fastest known config

152.7 decode tok/s

on M3 Ultra (60-core GPU) + 96GB unified via mlx see full run

M3 Pro (18-core GPU) + 36GB unifiedM3 Pro (18-core GPU) + 36GB unified

WorkloadBackendQuantdecode tok/sprefill tok/sTTFTRun
chat-shortmlx@0.31.3no datano datano datar_b-wpf9nxe2k
chat-shortmlx@0.31.3no datano datano datar_hio10b_02gx

M3 Ultra (60-core GPU) + 96GB unifiedM3 Ultra (60-core GPU) + 96GB unified

WorkloadBackendQuantdecode tok/sprefill tok/sTTFTRun
chat-shortmlx@0.31.3152.7tok/s239.9tok/s692msr_3ijun8ltjnb

gpt-oss-20b-MXFP4-Q4 on hardware