gemma-4-31B-it-Q4_K_M.gguf
33 workload results across 1 hardware configuration.
Fastest local config
69.7 decode tok/s
on RTX 5090 (32GB) + AMD Ryzen 7 9850X3D 8-Core Processor (8c) + 30GB via llama.cpp — see full run
Local runs (33 runs)
Runs from contributors' own machines via MLX, llama.cpp, vLLM, exllamav2, or ollama. Signed on the submitter's hardware.
RTX 5090 (32GB) + AMD Ryzen 7 9850X3D 8-Core Processor (8c) + 30GB
| Workload | Backend | Quant | decode tok/s | prefill tok/s | TTFT | Run |
|---|---|---|---|---|---|---|
| chat-short | llama.cpp | — | 67.37tok/s | no data | 180ms | r_uut6m_v6ui9 |
| chat-short | llama.cpp | — | 67.27tok/s | no data | 344ms | r_23fireoga9y |
| chat-short | llama.cpp | — | 67.33tok/s | no data | 156ms | r_1ofpvf4m7p5 |
| chat-short | llama.cpp | — | 67.31tok/s | no data | 160ms | r_wwq2l7lmy6d |
| chat-short | llama.cpp | — | 66.39tok/s | no data | 210ms | r_wppiqxyk9iw |
| chat-short | llama.cpp | — | 67.05tok/s | no data | 207ms | r_onvh-dpmck- |
| chat-short | llama.cpp | — | 67.37tok/s | no data | 176ms | r_plujbqnef08 |
| chat-short | llama.cpp | — | 69.36tok/s | no data | 146ms | r_n65ex7zl6ts |
| chat-short | llama.cpp | — | 67.30tok/s | no data | 167ms | r_jkn4ltg393p |
| chat-short | llama.cpp | — | 69.67tok/s | no data | 175ms | r_pa7nuerd9tl |
| chat-short | llama.cpp | — | 67.37tok/s | no data | 157ms | r_moawmhigq5d |
| chat-short | llama.cpp | — | 67.24tok/s | no data | 223ms | r_4fi6u0-u2ih |
| chat-short | llama.cpp | — | 66.19tok/s | no data | 200ms | r_s0-yb8j14ys |
| chat-short | llama.cpp | — | 67.39tok/s | no data | 172ms | r_n6su7ptrulh |
| chat-short | llama.cpp | — | 67.17tok/s | no data | 202ms | r_f1wlemwip05 |
| chat-short | llama.cpp | — | 67.67tok/s | no data | 209ms | r_lrxodemb-w0 |
| chat-short | llama.cpp | — | 67.49tok/s | no data | 221ms | r_lzv-y6_-n0b |
| chat-short | llama.cpp | — | 67.97tok/s | no data | 151ms | r_0ztzrxt5qw2 |
| chat-short | llama.cpp | — | 66.00tok/s | no data | 226ms | r_f-uojs7vckb |
| chat-short | llama.cpp | — | 67.05tok/s | no data | 322ms | r_0dfevy9f4gh |
| chat-short | llama.cpp | — | 67.44tok/s | no data | 161ms | r_0igp5bsgx_7 |
| chat-short | llama.cpp | — | 64.71tok/s | no data | 197ms | r_608esz63ib8 |
| chat-short | llama.cpp | — | 66.74tok/s | no data | 200ms | r_616apthaana |
| chat-short | llama.cpp | — | 68.77tok/s | no data | 200ms | r_epzfs8k8ohh |
| chat-short | llama.cpp | — | 67.80tok/s | no data | 189ms | r_dee_mcm1ga0 |
| chat-short | llama.cpp | — | 67.73tok/s | no data | 228ms | r_ea3vckhh-cn |
| chat-short | llama.cpp | — | 67.70tok/s | no data | 156ms | r_yz3q11pa049 |
| chat-short | llama.cpp | — | 67.54tok/s | no data | 183ms | r_-_ag61ig4f_ |
| chat-short | llama.cpp | — | 67.16tok/s | no data | 164ms | r_pnk9rze10h7 |
| chat-short | llama.cpp | — | 66.70tok/s | no data | 179ms | r__679a-4h44q |
| chat-short | llama.cpp | — | 67.17tok/s | no data | 202ms | r_-dxzsj2j-y_ |
| chat-short | llama.cpp | — | 66.75tok/s | no data | 195ms | r_wtd90zbbusq |
| chat-short | llama.cpp | — | 67.19tok/s | no data | 230ms | r_q-it_4s9z4p |