RTX 5090 (32GB) — LLM benchmarks
2 workload results across 1 model.
Fastest known config on RTX 5090 (32GB)
21.6 decode tok/s
claude-sonnet-4.5 via hosted-api — see full run
claude-sonnet-4.5
| Workload | Backend | Quant | decode tok/s | prefill tok/s | TTFT | Run |
|---|---|---|---|---|---|---|
| chat-short | hosted-api | — | 19.37tok/s | 38.74tok/s | 3,046ms | r_1eh7fflb9gn |
| chat-short | hosted-api | — | 21.55tok/s | 63.52tok/s | 1,858ms | r_6lqzt2qidt- |