x โ LLM benchmarks
56 workload results across 1 model.
m
| Workload | Backend | Quant | decode tok/s | prefill tok/s | TTFT | Run |
|---|---|---|---|---|---|---|
| chat-short | llama.cpp | Q4 | 10.00tok/s | no data | no data | r_0_i4fok_cfg |
| chat-short | llama.cpp | Q4 | 10.00tok/s | no data | 0.0ms | r_abgapkfvfla |
| chat-short | llama.cpp | Q4 | 10.00tok/s | no data | no data | r_g356kkzjf5c |
| chat-short | llama.cpp | Q4 | 10.00tok/s | no data | no data | r_g356kkzjf5c |
| chat-short | llama.cpp | Q4 | 10.00tok/s | no data | no data | r_g356kkzjf5c |
| chat-short | llama.cpp | Q4 | 10.00tok/s | no data | no data | r_g356kkzjf5c |
| chat-short | llama.cpp | Q4 | 10.00tok/s | no data | no data | r_g356kkzjf5c |
| chat-short | llama.cpp | Q4 | 10.00tok/s | no data | no data | r_g356kkzjf5c |
| chat-short | llama.cpp | Q4 | 10.00tok/s | no data | no data | r_g356kkzjf5c |
| chat-short | llama.cpp | Q4 | 10.00tok/s | no data | no data | r_g356kkzjf5c |
| chat-short | llama.cpp | Q4 | 10.00tok/s | no data | no data | r_g356kkzjf5c |
| chat-short | llama.cpp | Q4 | 10.00tok/s | no data | no data | r_g356kkzjf5c |
| chat-short | llama.cpp | Q4 | 10.00tok/s | no data | no data | r_g356kkzjf5c |
| chat-short | llama.cpp | Q4 | 10.00tok/s | no data | no data | r_g356kkzjf5c |
| chat-short | llama.cpp | Q4 | 10.00tok/s | no data | no data | r_g356kkzjf5c |
| chat-short | llama.cpp | Q4 | 10.00tok/s | no data | no data | r_g356kkzjf5c |
| chat-short | llama.cpp | Q4 | 10.00tok/s | no data | no data | r_g356kkzjf5c |
| chat-short | llama.cpp | Q4 | 10.00tok/s | no data | no data | r_g356kkzjf5c |
| chat-short | llama.cpp | Q4 | 10.00tok/s | no data | no data | r_g356kkzjf5c |
| chat-short | llama.cpp | Q4 | 10.00tok/s | no data | no data | r_g356kkzjf5c |
| chat-short | llama.cpp | Q4 | 10.00tok/s | no data | no data | r_g356kkzjf5c |
| chat-short | llama.cpp | Q4 | 10.00tok/s | no data | no data | r_g356kkzjf5c |
| chat-short | llama.cpp | Q4 | 10.00tok/s | no data | no data | r_g356kkzjf5c |
| chat-short | llama.cpp | Q4 | 10.00tok/s | no data | no data | r_g356kkzjf5c |
| chat-short | llama.cpp | Q4 | 10.00tok/s | no data | no data | r_g356kkzjf5c |
| chat-short | llama.cpp | Q4 | 10.00tok/s | no data | no data | r_g356kkzjf5c |
| chat-short | llama.cpp | Q4 | 10.00tok/s | no data | no data | r_g356kkzjf5c |
| chat-short | llama.cpp | Q4 | 10.00tok/s | no data | no data | r_g356kkzjf5c |
| chat-short | llama.cpp | Q4 | 10.00tok/s | no data | no data | r_g356kkzjf5c |
| chat-short | llama.cpp | Q4 | 10.00tok/s | no data | no data | r_g356kkzjf5c |
| chat-short | llama.cpp | Q4 | 10.00tok/s | no data | no data | r_g356kkzjf5c |
| chat-short | llama.cpp | Q4 | 10.00tok/s | no data | no data | r_g356kkzjf5c |
| chat-short | llama.cpp | Q4 | 10.00tok/s | no data | no data | r_g356kkzjf5c |
| chat-short | llama.cpp | Q4 | 10.00tok/s | no data | no data | r_g356kkzjf5c |
| chat-short | llama.cpp | Q4 | 10.00tok/s | no data | no data | r_g356kkzjf5c |
| chat-short | llama.cpp | Q4 | 10.00tok/s | no data | no data | r_g356kkzjf5c |
| chat-short | llama.cpp | Q4 | 10.00tok/s | no data | no data | r_g356kkzjf5c |
| chat-short | llama.cpp | Q4 | 10.00tok/s | no data | no data | r_g356kkzjf5c |
| chat-short | llama.cpp | Q4 | 10.00tok/s | no data | no data | r_g356kkzjf5c |
| chat-short | llama.cpp | Q4 | 10.00tok/s | no data | no data | r_g356kkzjf5c |
| chat-short | llama.cpp | Q4 | 10.00tok/s | no data | no data | r_g356kkzjf5c |
| chat-short | llama.cpp | Q4 | 10.00tok/s | no data | no data | r_g356kkzjf5c |
| chat-short | llama.cpp | Q4 | 10.00tok/s | no data | no data | r_g356kkzjf5c |
| chat-short | llama.cpp | Q4 | 10.00tok/s | no data | no data | r_g356kkzjf5c |
| chat-short | llama.cpp | Q4 | 10.00tok/s | no data | no data | r_g356kkzjf5c |
| chat-short | llama.cpp | Q4 | 10.00tok/s | no data | no data | r_g356kkzjf5c |
| chat-short | llama.cpp | Q4 | 10.00tok/s | no data | no data | r_g356kkzjf5c |
| chat-short | llama.cpp | Q4 | 10.00tok/s | no data | no data | r_g356kkzjf5c |
| chat-short | llama.cpp | Q4 | 10.00tok/s | no data | no data | r_g356kkzjf5c |
| chat-short | llama.cpp | Q4 | 10.00tok/s | no data | no data | r_g356kkzjf5c |
| chat-short | llama.cpp | Q4 | 10.00tok/s | no data | no data | r_g356kkzjf5c |
| chat-short | llama.cpp | Q4 | 10.00tok/s | no data | no data | r_g356kkzjf5c |
| chat-short | llama.cpp | Q4 | 10.00tok/s | no data | no data | r_3r1vcq0s4vo |
| chat-short | llama.cpp | Q4 | 10.00tok/s | no data | no data | r_dnvwv68uo3z |
| chat-short | llama.cpp | Q4 | 10.00tok/s | no data | no data | r_59h1mxy0mzj |
| chat-short | llama.cpp | Q4 | 10.00tok/s | no data | no data | r_w6ugvsylxe7 |
Models measured on x
Common questions about x
Direct Q&A drawn from the runs above: fastest LLM, supported model classes, backend rankings, quantization guidance.