Skip to content
llm-speed

victim

1 workload result across 1 hardware configuration.

Fastest local config

42.0 decode tok/s

on Pentest-Bench via llama.cpp (Q4_K_M) see full run

Local runs (1 run)

Runs from contributors' own machines via MLX, llama.cpp, vLLM, exllamav2, or ollama. Signed on the submitter's hardware.

Pentest-BenchPentest-Bench

WorkloadBackendQuantdecode tok/sprefill tok/sTTFTRun
chat-shortllama.cpp@b9999Q4_K_M42.00tok/s100.0tok/s50.0msr_b-ndu-9uswz

victim on hardware