Skip to content
llm-speed
Home/Tools/predict

Free tool · suite-v1

What tok/s should I expect?

Pick a model and a hardware rig — we look up the best decode tokens-per-second from real llm-speed submissions for that cell and show you the median, the range, and the runs that back the number. No signup, no opaque model — every prediction links to the run that produced it.

Pick a model and a hardware to see the prediction.

Pick a model and a hardware to see the prediction. Or browse the full cheatsheet of every (model × hardware) cell we have data for.

How the prediction works

For a (model, hardware) pair we collect every submitted suite-v1 workload result whose canonical model slug and primary hardware label match. The median is the headline number; min/max define the confidence band. Numbers are wall-clock decode tokens-per-second on the workload that produced each run's top decode rate, batch size 1 unless the workload says otherwise. Read the full methodology.

Related