gemma3
12 workload results across 1 hardware configuration.
Fastest local config
195.0 decode tok/s
on RTX 4090 (24GB) + AMD EPYC 7443 24-Core Processor (24c) + 252GB via ollama (Q4_K_M) — see full run
Local runs (12 runs)
Runs from contributors' own machines via MLX, llama.cpp, vLLM, exllamav2, or ollama. Signed on the submitter's hardware.
RTX 4090 (24GB) + AMD EPYC 7443 24-Core Processor (24c) + 252GB
| Workload | Backend | Quant | decode tok/s | prefill tok/s | TTFT | Run |
|---|---|---|---|---|---|---|
| chat-short | ollama@0.31.1 | Q4_K_M | 46.95tok/s | 144.5tok/s | 817ms | r_x23y_sg24pm |
| chat-long | ollama@0.31.1 | Q4_K_M | 45.25tok/s | 1,594.1tok/s | 1,998ms | r_x23y_sg24pm |
| concurrent-decode | ollama@0.31.1 | Q4_K_M | 46.17tok/s | no data | no data | r_x23y_sg24pm |
| agent-trace | ollama@0.31.1 | Q4_K_M | 45.33tok/s | 2,556.8tok/s | 956ms | r_x23y_sg24pm |
| chat-short | ollama@0.31.1 | Q4_K_M | 92.65tok/s | 171.4tok/s | 689ms | r_3kmrc135e0e |
| chat-long | ollama@0.31.1 | Q4_K_M | 87.60tok/s | 2,206.4tok/s | 1,444ms | r_3kmrc135e0e |
| concurrent-decode | ollama@0.31.1 | Q4_K_M | 90.43tok/s | no data | no data | r_3kmrc135e0e |
| agent-trace | ollama@0.31.1 | Q4_K_M | 88.53tok/s | 3,123.9tok/s | 815ms | r_3kmrc135e0e |
| chat-short | ollama@0.31.1 | Q4_K_M | 195.0tok/s | 167.2tok/s | 706ms | r_dlanfbgym0h |
| chat-long | ollama@0.31.1 | Q4_K_M | 187.6tok/s | 3,254.1tok/s | 979ms | r_dlanfbgym0h |
| concurrent-decode | ollama@0.31.1 | Q4_K_M | 193.1tok/s | no data | no data | r_dlanfbgym0h |
| agent-trace | ollama@0.31.1 | Q4_K_M | 190.0tok/s | 3,543.4tok/s | 715ms | r_dlanfbgym0h |