Skip to content
llm-speed
Leaderboard/model/llama-3-1-8b-instruct-4bit

Llama-3.1-8B-Instruct-4bit

1 workload result across 1 hardware configuration.

Fastest known config

29.2 decode tok/s

on M3 Pro (18-core GPU) + 36GB unified via mlx see full run

M3 Pro (18-core GPU) + 36GB unifiedM3 Pro (18-core GPU) + 36GB unified

WorkloadBackendQuantdecode tok/sprefill tok/sTTFTRun
chat-shortmlx@0.31.329.20tok/s203.3tok/s669msr_h0-use1ypnb

Llama-3.1-8B-Instruct-4bit on hardware