Skip to content
llm-speed
Leaderboard/models/test-model

test-model

4 workload results across 1 hardware configuration.

Fastest local config

42.0 decode tok/s

on Pentest-Bench via llama.cpp (Q4_K_M) see full run

Local runs (4 runs)

Runs from contributors' own machines via MLX, llama.cpp, vLLM, exllamav2, or ollama. Signed on the submitter's hardware.

Pentest-BenchPentest-Bench

WorkloadBackendQuantdecode tok/sprefill tok/sTTFTRun
chat-shortllama.cpp@b9999Q4_K_M42.00tok/s100.0tok/s50.0msr_1jskg9qv_8b
chat-shortllama.cpp@b9999Q4_K_M42.00tok/s100.0tok/s50.0msr_rl-kbwr9chb
chat-shortllama.cpp@b9999Q4_K_M42.00tok/s100.0tok/s50.0msr_7z262rwoo08
chat-shortllama.cpp@b9999Q4_K_M42.00tok/s100.0tok/s50.0msr_wd6-1z548j_

test-model on hardware