Skip to content
llm-speed
Leaderboard/models/gemma-4-31b-it

gemma-4-31B-it-Q4_K_M.gguf

33 workload results across 1 hardware configuration.

Fastest local config

69.7 decode tok/s

on RTX 5090 (32GB) + AMD Ryzen 7 9850X3D 8-Core Processor (8c) + 30GB via llama.cpp see full run

Local runs (33 runs)

Runs from contributors' own machines via MLX, llama.cpp, vLLM, exllamav2, or ollama. Signed on the submitter's hardware.

RTX 5090 (32GB) + AMD Ryzen 7 9850X3D 8-Core Processor (8c) + 30GBRTX 5090 (32GB) + AMD Ryzen 7 9850X3D 8-Core Processor (8c) + 30GB

WorkloadBackendQuantdecode tok/sprefill tok/sTTFTRun
chat-shortllama.cpp67.37tok/sno data180msr_uut6m_v6ui9
chat-shortllama.cpp67.27tok/sno data344msr_23fireoga9y
chat-shortllama.cpp67.33tok/sno data156msr_1ofpvf4m7p5
chat-shortllama.cpp67.31tok/sno data160msr_wwq2l7lmy6d
chat-shortllama.cpp66.39tok/sno data210msr_wppiqxyk9iw
chat-shortllama.cpp67.05tok/sno data207msr_onvh-dpmck-
chat-shortllama.cpp67.37tok/sno data176msr_plujbqnef08
chat-shortllama.cpp69.36tok/sno data146msr_n65ex7zl6ts
chat-shortllama.cpp67.30tok/sno data167msr_jkn4ltg393p
chat-shortllama.cpp69.67tok/sno data175msr_pa7nuerd9tl
chat-shortllama.cpp67.37tok/sno data157msr_moawmhigq5d
chat-shortllama.cpp67.24tok/sno data223msr_4fi6u0-u2ih
chat-shortllama.cpp66.19tok/sno data200msr_s0-yb8j14ys
chat-shortllama.cpp67.39tok/sno data172msr_n6su7ptrulh
chat-shortllama.cpp67.17tok/sno data202msr_f1wlemwip05
chat-shortllama.cpp67.67tok/sno data209msr_lrxodemb-w0
chat-shortllama.cpp67.49tok/sno data221msr_lzv-y6_-n0b
chat-shortllama.cpp67.97tok/sno data151msr_0ztzrxt5qw2
chat-shortllama.cpp66.00tok/sno data226msr_f-uojs7vckb
chat-shortllama.cpp67.05tok/sno data322msr_0dfevy9f4gh
chat-shortllama.cpp67.44tok/sno data161msr_0igp5bsgx_7
chat-shortllama.cpp64.71tok/sno data197msr_608esz63ib8
chat-shortllama.cpp66.74tok/sno data200msr_616apthaana
chat-shortllama.cpp68.77tok/sno data200msr_epzfs8k8ohh
chat-shortllama.cpp67.80tok/sno data189msr_dee_mcm1ga0
chat-shortllama.cpp67.73tok/sno data228msr_ea3vckhh-cn
chat-shortllama.cpp67.70tok/sno data156msr_yz3q11pa049
chat-shortllama.cpp67.54tok/sno data183msr_-_ag61ig4f_
chat-shortllama.cpp67.16tok/sno data164msr_pnk9rze10h7
chat-shortllama.cpp66.70tok/sno data179msr__679a-4h44q
chat-shortllama.cpp67.17tok/sno data202msr_-dxzsj2j-y_
chat-shortllama.cpp66.75tok/sno data195msr_wtd90zbbusq
chat-shortllama.cpp67.19tok/sno data230msr_q-it_4s9z4p

gemma-4-31B-it-Q4_K_M.gguf on hardware