Skip to content
llm-speed
Leaderboard/model/anthropic-claude-sonnet-4-5

claude-sonnet-4.5

3 workload results across 3 hardware configurations.

Fastest known config

21.6 decode tok/s

on RTX 5090 (32GB) + AMD Ryzen 7 9850X3D 8-Core Processor (8c) + 30GB via hosted-api see full run

M3 Pro (18-core GPU) + 36GB unifiedM3 Pro (18-core GPU) + 36GB unified

WorkloadBackendQuantdecode tok/sprefill tok/sTTFTRun
chat-shorthosted-api19.97tok/s70.48tok/s1,674msr__y3_fli1-p9

RTX 5090 (32GB) + AMD64 Family 26 Model 68 Stepping 0, AuthenticAMD (8c) + 62GBRTX 5090 (32GB) + AMD64 Family 26 Model 68 Stepping 0, AuthenticAMD (8c) + 62GB

WorkloadBackendQuantdecode tok/sprefill tok/sTTFTRun
chat-shorthosted-api19.37tok/s38.74tok/s3,046msr_1eh7fflb9gn

RTX 5090 (32GB) + AMD Ryzen 7 9850X3D 8-Core Processor (8c) + 30GBRTX 5090 (32GB) + AMD Ryzen 7 9850X3D 8-Core Processor (8c) + 30GB

WorkloadBackendQuantdecode tok/sprefill tok/sTTFTRun
chat-shorthosted-api21.55tok/s63.52tok/s1,858msr_6lqzt2qidt-

claude-sonnet-4.5 on hardware