Hardware FAQ
x — frequently asked questions
Direct answers to the questions local-LLM enthusiasts ask about x, drawn from 7 signed runs on llm-speed. Every numerical claim links to a verifiable run permalink at /r/<id>.
What's the fastest LLM on x?
The fastest measured LLM on x on llm-speed is m at 10.0 decode tok/s on llama.cpp (workload chat-short, run r_0_i4fok_cfg). Cite as https://llm-speed.com/r/r_0_i4fok_cfg.
This is the headline decode tokens-per-second across every (model, backend) pairing submitted on x; faster results may exist on hardware not yet benchmarked, but among signed runs this is the published top.
Can I run a 7B-class model on x?
No 7B-class model run on x has been submitted to llm-speed yet, so the canonical "yes/no with a measured tok/s" answer is not currently published.
x should fit a 7B model at 4-bit quantization; submit a run with "llm-speed bench --models <hf-id>" to populate this answer.
Can I run a 30B-class model on x?
No 30B-class model run on x has been submitted to llm-speed yet, so the canonical "yes/no with a measured tok/s" answer is not currently published.
x should fit a 30B model at 4-bit quantization; submit a run with "llm-speed bench --models <hf-id>" to populate this answer.
Can I run a 70B-class model on x?
No 70B-class model run on x has been submitted to llm-speed yet, so the canonical "yes/no with a measured tok/s" answer is not currently published.
x should fit a 70B model at 4-bit quantization; submit a run with "llm-speed bench --models <hf-id>" to populate this answer.
How does x compare to RTX 5090 for local LLM inference?
For a head-to-head between x and RTX 5090, see the side-by-side comparison page at https://llm-speed.com/vs/x-vs-rtx-5090, which lays out every (model, backend) pair where both rigs have a signed run.
As a single-rig anchor, x tops out at 10.0 decode tok/s on m via llama.cpp (run r_0_i4fok_cfg); the RTX 5090 top number is on its own /hw/<slug> page so the comparison stays grounded in measured numbers, not extrapolation.
x vs RTX 5090 (side-by-side) · RTX 5090 leaderboard · x leaderboard
Which backend is fastest on x?
The only backend with a signed local run on x so far is llama.cpp, with a top result of 10.0 decode tok/s on m (run r_0_i4fok_cfg); a multi-backend comparison on x is not yet published.
Submit a competing run with "llm-speed bench --backends <other-backend>" on a x machine to populate the comparison.
Is x worth it for local LLM inference in 2026?
On signed data, x delivers up to 10.0 decode tok/s on m via llama.cpp (run r_0_i4fok_cfg), which puts it near or below conversational-reading speed (~20 tok/s) for the published top configuration.
"Worth it" depends on your model class: x is most useful for 7B-class models and smaller; larger models will need quantization or a bigger rig.
For "what fits and how fast", the per-model rows on /hw/x are the honest answer; for cross-rig comparisons, see /vs/x-vs-rtx-5090.
What quantization should I use on x?
On x, the only quant with a signed local run is Q4 at 10.0 tok/s on m (run r_0_i4fok_cfg); a multi-quant comparison on x is not yet published.
Quant choice is a quality-vs-speed tradeoff that this hardware FAQ does not arbitrate; llm-speed publishes hardware-side speed, not output quality. For quality scores, see the model card on Hugging Face and the LMSYS Chatbot Arena.