Q: What's the fastest LLM on Pentest-Bench?

The fastest measured LLM on Pentest-Bench on llm-speed is alert(1) at 42.0 decode tok/s on llama.cpp b9999 (workload chat-short, run r_a3ei8og3rkg). Cite as https://llm-speed.com/r/r_a3ei8og3rkg. This is the headline decode tokens-per-second across every (model, backend) pairing submitted on Pentest-Bench; faster results may exist on hardware not yet benchmarked, but among signed runs this is the published top.

Question 1

What's the fastest LLM on Pentest-Bench?

Accepted Answer

The fastest measured LLM on Pentest-Bench on llm-speed is at 42.0 decode tok/s on llama.cpp b9999 (workload chat-short, run r_a3ei8og3rkg). Cite as https://llm-speed.com/r/r_a3ei8og3rkg. This is the headline decode tokens-per-second across every (model, backend) pairing submitted on Pentest-Bench; faster results may exist on hardware not yet benchmarked, but among signed runs this is the published top.

Question 2

Can I run a 7B-class model on Pentest-Bench?

Accepted Answer

No 7B-class model run on Pentest-Bench has been submitted to llm-speed yet, so the canonical "yes/no with a measured tok/s" answer is not currently published. Pentest-Bench should fit a 7B model at 4-bit quantization; submit a run with "llm-speed bench --models <hf-id>" to populate this answer.

Question 3

Can I run a 30B-class model on Pentest-Bench?

Accepted Answer

No 30B-class model run on Pentest-Bench has been submitted to llm-speed yet, so the canonical "yes/no with a measured tok/s" answer is not currently published. Pentest-Bench should fit a 30B model at 4-bit quantization; submit a run with "llm-speed bench --models <hf-id>" to populate this answer.

Question 4

Can I run a 70B-class model on Pentest-Bench?

Accepted Answer

No 70B-class model run on Pentest-Bench has been submitted to llm-speed yet, so the canonical "yes/no with a measured tok/s" answer is not currently published. Pentest-Bench should fit a 70B model at 4-bit quantization; submit a run with "llm-speed bench --models <hf-id>" to populate this answer.

Question 5

How does Pentest-Bench compare to RTX 5090 for local LLM inference?

Accepted Answer

For a head-to-head between Pentest-Bench and RTX 5090, see the side-by-side comparison page at https://llm-speed.com/vs/pentest-bench-vs-rtx-5090, which lays out every (model, backend) pair where both rigs have a signed run. As a single-rig anchor, Pentest-Bench tops out at 42.0 decode tok/s on via llama.cpp (run r_a3ei8og3rkg); the RTX 5090 top number is on its own /hw/ page so the comparison stays grounded in measured numbers, not extrapolation.

Question 6

Which backend is fastest on Pentest-Bench?

Accepted Answer

The only backend with a signed local run on Pentest-Bench so far is llama.cpp, with a top result of 42.0 decode tok/s on (run r_a3ei8og3rkg); a multi-backend comparison on Pentest-Bench is not yet published. Submit a competing run with "llm-speed bench --backends " on a Pentest-Bench machine to populate the comparison.

Question 7

Is Pentest-Bench worth it for local LLM inference in 2026?

Accepted Answer

On signed data, Pentest-Bench delivers up to 42.0 decode tok/s on via llama.cpp (run r_a3ei8og3rkg), which puts it above conversational-reading speed (~20 tok/s) but below interactive-coding thresholds for the published top configuration. "Worth it" depends on your model class: Pentest-Bench is most useful for 7B-class models and smaller; larger models will need quantization or a bigger rig. For "what fits and how fast", the per-model rows on /hw/pentest-bench are the honest answer; for cross-rig comparisons, see /vs/pentest-bench-vs-rtx-5090.

Question 8

What quantization should I use on Pentest-Bench?

Accepted Answer

On Pentest-Bench, the only quant with a signed local run is Q4_K_M at 42.0 tok/s on (run r_a3ei8og3rkg); a multi-quant comparison on Pentest-Bench is not yet published. Quant choice is a quality-vs-speed tradeoff that this hardware FAQ does not arbitrate; llm-speed publishes hardware-side speed, not output quality. For quality scores, see the model card on Hugging Face and the LMSYS Chatbot Arena.