RTX 5070 Ti — LLM benchmarks
No benchmarks on RTX 5070 Ti yet.
No RTX 5070 Ti benchmarks yet.
Run on YOUR hardware to populate this page: pipx install llm-speed && llm-speed bench
$ pipx install llm-speed && llm-speed bench
Community folklore on RTX 5070 Ti
60 unverified claims extracted from Reddit/HN comments. Lower trust than signed runs above — every row links to the source.
- communityconfidence 75%
42.00tok/s — Qwen3-30B-A3B on RTX 5070 Ti via ollama IQ4_XS
“(hardware is a RTX 5070Ti 16GB GPU dedicated to this). I was going to test Qwen3-30B-A3B-Instruct-2507-GGUF:IQ4_XS which I can get about 42 tok/s using a RTX 5070Ti.”
- communityconfidence 55%
43.88tok/s — Qwen3-Coder-Next on RTX 5070 Ti IQ4_XS
“n3-Coder-Next-IQ4_XS` For scaffolding a small Rust Project this gives me: - Prompt Processing: `154.55 tokens/s` - Token Generation: `43.88 tokens/s` I have been previously using llama with -ngl 12-14 (depending on how much free VRAM I have), which gave me only around 14 t…”
- communityconfidence 55%
154.6tok/s — Qwen3-Coder-Next on RTX 5070 Ti IQ4_XS
“- RTX 5070 Ti 16GB I'm running `Qwen3-Coder-Next-IQ4_XS` For scaffolding a small Rust Project this gives me: - Prompt Processing: `154.55 tokens/s` - Token Generation: `43.88 tokens/s` I have been previously using llama with -ngl 12-14 (depending on how much free VRAM I h…”
- communityconfidence 55%
43.88tok/s — Qwen3-Coder-Next on RTX 5070 Ti IQ4_XS
“n3-Coder-Next-IQ4_XS` For scaffolding a small Rust Project this gives me: - Prompt Processing: `154.55 tokens/s` - Token Generation: `43.88 tokens/s` I have been previously using llama with -ngl 12-14 (depending on how much free VRAM I have), which gave me only around 14 t…”
- communityconfidence 55%
154.6tok/s — Qwen3-Coder-Next on RTX 5070 Ti IQ4_XS
“- RTX 5070 Ti 16GB I'm running `Qwen3-Coder-Next-IQ4_XS` For scaffolding a small Rust Project this gives me: - Prompt Processing: `154.55 tokens/s` - Token Generation: `43.88 tokens/s` I have been previously using llama with -ngl 12-14 (depending on how much free VRAM I h…”
- communityconfidence 50%
79.00tok/s — on RTX 5070 Ti via llama.cpp
“RTX 5070 Ti + 9800X3D running Qwen3.6-35B-A3B at 79 t/s with 128K context, the --n-cpu-moe flag is the most important part. Spent an evening dialing in Qwen3.6-35B-A3B on consumer hardware. Fun”
- communityconfidence 50%
79.00tok/s — on RTX 5070 Ti via llama.cpp
“RTX 5070 Ti + 9800X3D running Qwen3.6-35B-A3B at 79 t/s with 128K context, the --n-cpu-moe flag is the most important part. Spent an evening dialing in Qwen3.6-35B-A3B on consumer hardware. Fun”
- communityconfidence 50%
79.00tok/s — on RTX 5070 Ti via llama.cpp
“RTX 5070 Ti + 9800X3D running Qwen3.6-35B-A3B at 79 t/s with 128K context, the --n-cpu-moe flag is the most important part. Spent an evening dialing in Qwen3.6-35B-A3B on consumer hardware. Fun”
- communityconfidence 50%
79.00tok/s — on RTX 5070 Ti via llama.cpp
“RTX 5070 Ti + 9800X3D running Qwen3.6-35B-A3B at 79 t/s with 128K context, the --n-cpu-moe flag is the most important part. Spent an evening dialing in Qwen3.6-35B-A3B on consumer hardware. Fun”
- communityconfidence 50%
79.00tok/s — on RTX 5070 Ti via llama.cpp
“RTX 5070 Ti + 9800X3D running Qwen3.6-35B-A3B at 79 t/s with 128K context, the --n-cpu-moe flag is the most important part. Spent an evening dialing in Qwen3.6-35B-A3B on consumer hardware. Fun”