gpt-oss
4 workload results across 1 hardware configuration.
Fastest local config
141.7 decode tok/s
on RTX 4090 (24GB) + AMD EPYC 7352 24-Core Processor (24c) + 252GB via ollama (MXFP4) — see full run
Local runs (4 runs)
Runs from contributors' own machines via MLX, llama.cpp, vLLM, exllamav2, or ollama. Signed on the submitter's hardware.
RTX 4090 (24GB) + AMD EPYC 7352 24-Core Processor (24c) + 252GB
| Workload | Backend | Quant | decode tok/s | prefill tok/s | TTFT | Run |
|---|---|---|---|---|---|---|
| chat-short | ollama@0.31.1 | MXFP4 | no data | no data | no data | r_iu2sfa9ykvw |
| chat-long | ollama@0.31.1 | MXFP4 | 141.7tok/s | 631.5tok/s | 5,048ms | r_iu2sfa9ykvw |
| concurrent-decode | ollama@0.31.1 | MXFP4 | no data | no data | no data | r_iu2sfa9ykvw |
| agent-trace | ollama@0.31.1 | MXFP4 | no data | no data | no data | r_iu2sfa9ykvw |