Skip to content
llm-speed
Leaderboard/model/openai-gpt-4o-mini

gpt-4o-mini

1 workload result across 1 hardware configuration.

Fastest known config

158.4 decode tok/s

on M3 Pro (18-core GPU) + 36GB unified via hosted-api see full run

M3 Pro (18-core GPU) + 36GB unifiedM3 Pro (18-core GPU) + 36GB unified

WorkloadBackendQuantdecode tok/sprefill tok/sTTFTRun
chat-shorthosted-api158.4tok/s106.2tok/s998msr_mya-83a8n3j

gpt-4o-mini on hardware