claude-sonnet-4.5
3 workload results across 3 hardware configurations.
Fastest known config
21.6 decode tok/s
on RTX 5090 (32GB) + AMD Ryzen 7 9850X3D 8-Core Processor (8c) + 30GB via hosted-api — see full run
M3 Pro (18-core GPU) + 36GB unified
| Workload | Backend | Quant | decode tok/s | prefill tok/s | TTFT | Run |
|---|---|---|---|---|---|---|
| chat-short | hosted-api | — | 19.97tok/s | 70.48tok/s | 1,674ms | r__y3_fli1-p9 |
RTX 5090 (32GB) + AMD64 Family 26 Model 68 Stepping 0, AuthenticAMD (8c) + 62GB
| Workload | Backend | Quant | decode tok/s | prefill tok/s | TTFT | Run |
|---|---|---|---|---|---|---|
| chat-short | hosted-api | — | 19.37tok/s | 38.74tok/s | 3,046ms | r_1eh7fflb9gn |
RTX 5090 (32GB) + AMD Ryzen 7 9850X3D 8-Core Processor (8c) + 30GB
| Workload | Backend | Quant | decode tok/s | prefill tok/s | TTFT | Run |
|---|---|---|---|---|---|---|
| chat-short | hosted-api | — | 21.55tok/s | 63.52tok/s | 1,858ms | r_6lqzt2qidt- |