test-model
4 workload results across 1 hardware configuration.
Local runs (4 runs)
Runs from contributors' own machines via MLX, llama.cpp, vLLM, exllamav2, or ollama. Signed on the submitter's hardware.
Pentest-Bench
| Workload | Backend | Quant | decode tok/s | prefill tok/s | TTFT | Run |
|---|---|---|---|---|---|---|
| chat-short | llama.cpp@b9999 | Q4_K_M | 42.00tok/s | 100.0tok/s | 50.0ms | r_1jskg9qv_8b |
| chat-short | llama.cpp@b9999 | Q4_K_M | 42.00tok/s | 100.0tok/s | 50.0ms | r_rl-kbwr9chb |
| chat-short | llama.cpp@b9999 | Q4_K_M | 42.00tok/s | 100.0tok/s | 50.0ms | r_7z262rwoo08 |
| chat-short | llama.cpp@b9999 | Q4_K_M | 42.00tok/s | 100.0tok/s | 50.0ms | r_wd6-1z548j_ |