test-model

Name: test-model — community LLM benchmarks
Creator: llm-speed
License: https://www.apache.org/licenses/LICENSE-2.0
Keywords: test-model, LLM benchmark, tokens per second, decode tok/s, prefill, TTFT

4 workload results across 1 hardware configuration.

Fastest local config

42.0 decode tok/s

on Pentest-Bench via llama.cpp (Q4_K_M) — see full run

Local runs (4 runs)

Runs from contributors' own machines via MLX, llama.cpp, vLLM, exllamav2, or ollama. Signed on the submitter's hardware.

Pentest-Bench

Workload	Backend	Quant	decode tok/s	prefill tok/s	TTFT	Run
chat-short	llama.cpp@b9999	Q4_K_M	42.00tok/s	100.0tok/s	50.0ms	r_1jskg9qv_8b
chat-short	llama.cpp@b9999	Q4_K_M	42.00tok/s	100.0tok/s	50.0ms	r_rl-kbwr9chb
chat-short	llama.cpp@b9999	Q4_K_M	42.00tok/s	100.0tok/s	50.0ms	r_7z262rwoo08
chat-short	llama.cpp@b9999	Q4_K_M	42.00tok/s	100.0tok/s	50.0ms	r_wd6-1z548j_

test-model on hardware

Pentest-Bench LLM benchmarks

Pentest-BenchPentest-Bench

test-model on hardware

Pentest-Bench