Skip to content
llm-speed

gpt-oss

4 workload results across 1 hardware configuration.

Fastest local config

141.7 decode tok/s

on RTX 4090 (24GB) + AMD EPYC 7352 24-Core Processor (24c) + 252GB via ollama (MXFP4) see full run

Local runs (4 runs)

Runs from contributors' own machines via MLX, llama.cpp, vLLM, exllamav2, or ollama. Signed on the submitter's hardware.

RTX 4090 (24GB) + AMD EPYC 7352 24-Core Processor (24c) + 252GBRTX 4090 (24GB) + AMD EPYC 7352 24-Core Processor (24c) + 252GB

WorkloadBackendQuantdecode tok/sprefill tok/sTTFTRun
chat-shortollama@0.31.1MXFP4no datano datano datar_iu2sfa9ykvw
chat-longollama@0.31.1MXFP4141.7tok/s631.5tok/s5,048msr_iu2sfa9ykvw
concurrent-decodeollama@0.31.1MXFP4no datano datano datar_iu2sfa9ykvw
agent-traceollama@0.31.1MXFP4no datano datano datar_iu2sfa9ykvw

gpt-oss on hardware