Free tool · suite-v1
What tok/s should I expect?
Pick a model and a hardware rig — we look up the best decode tokens-per-second from real llm-speed submissions for that cell and show you the median, the range, and the runs that back the number. No signup, no opaque model — every prediction links to the run that produced it.
Pick a model and a hardware to see the prediction. Or browse the full cheatsheet of every (model × hardware) cell we have data for.
How the prediction works
For a (model, hardware) pair we collect every submitted suite-v1 workload result whose canonical model slug and primary hardware label match. The median is the headline number; min/max define the confidence band. Numbers are wall-clock decode tokens-per-second on the workload that produced each run's top decode rate, batch size 1 unless the workload says otherwise. Read the full methodology.