Leaderboard/hardware/rtx-5070-ti

RTX 5070 Ti — LLM benchmarks

Name: RTX 5070 Ti — community LLM benchmarks
Creator: llm-speed
License: https://www.apache.org/licenses/LICENSE-2.0
Keywords: RTX 5070 Ti, LLM benchmark, tokens per second, decode tok/s, prefill, TTFT

No benchmarks on RTX 5070 Ti yet.

No RTX 5070 Ti benchmarks yet.

Run on YOUR hardware to populate this page: pipx install llm-speed && llm-speed bench

$ pipx install llm-speed && llm-speed bench

read the methodology

Community folklore on RTX 5070 Ti

60 unverified claims extracted from Reddit/HN comments. Lower trust than signed runs above — every row links to the source.

communityconfidence 75%
42.00tok/s — Qwen3-30B-A3B on RTX 5070 Ti via ollama IQ4_XS
“(hardware is a RTX 5070Ti 16GB GPU dedicated to this). I was going to test Qwen3-30B-A3B-Instruct-2507-GGUF:IQ4_XS which I can get about 42 tok/s using a RTX 5070Ti.”
source: Reddit · u/StartupTim · 2025-08-03
communityconfidence 55%
43.88tok/s — Qwen3-Coder-Next on RTX 5070 Ti IQ4_XS
“n3-Coder-Next-IQ4_XS` For scaffolding a small Rust Project this gives me: - Prompt Processing: `154.55 tokens/s` - Token Generation: `43.88 tokens/s` I have been previously using llama with -ngl 12-14 (depending on how much free VRAM I have), which gave me only around 14 t…”
source: Reddit · u/SalariedSlave · 2026-02-05
communityconfidence 55%
154.6tok/s — Qwen3-Coder-Next on RTX 5070 Ti IQ4_XS
“- RTX 5070 Ti 16GB I'm running `Qwen3-Coder-Next-IQ4_XS` For scaffolding a small Rust Project this gives me: - Prompt Processing: `154.55 tokens/s` - Token Generation: `43.88 tokens/s` I have been previously using llama with -ngl 12-14 (depending on how much free VRAM I h…”
source: Reddit · u/SalariedSlave · 2026-02-05
communityconfidence 55%
43.88tok/s — Qwen3-Coder-Next on RTX 5070 Ti IQ4_XS
“n3-Coder-Next-IQ4_XS` For scaffolding a small Rust Project this gives me: - Prompt Processing: `154.55 tokens/s` - Token Generation: `43.88 tokens/s` I have been previously using llama with -ngl 12-14 (depending on how much free VRAM I have), which gave me only around 14 t…”
source: Reddit · u/SalariedSlave · 2026-02-05
communityconfidence 55%
154.6tok/s — Qwen3-Coder-Next on RTX 5070 Ti IQ4_XS
“- RTX 5070 Ti 16GB I'm running `Qwen3-Coder-Next-IQ4_XS` For scaffolding a small Rust Project this gives me: - Prompt Processing: `154.55 tokens/s` - Token Generation: `43.88 tokens/s` I have been previously using llama with -ngl 12-14 (depending on how much free VRAM I h…”
source: Reddit · u/SalariedSlave · 2026-02-05
communityconfidence 50%
79.00tok/s — on RTX 5070 Ti via llama.cpp
“RTX 5070 Ti + 9800X3D running Qwen3.6-35B-A3B at 79 t/s with 128K context, the --n-cpu-moe flag is the most important part. Spent an evening dialing in Qwen3.6-35B-A3B on consumer hardware. Fun”
source: Reddit · u/marlang · 2026-04-18
communityconfidence 50%
79.00tok/s — on RTX 5070 Ti via llama.cpp
“RTX 5070 Ti + 9800X3D running Qwen3.6-35B-A3B at 79 t/s with 128K context, the --n-cpu-moe flag is the most important part. Spent an evening dialing in Qwen3.6-35B-A3B on consumer hardware. Fun”
source: Reddit · u/marlang · 2026-04-18
communityconfidence 50%
79.00tok/s — on RTX 5070 Ti via llama.cpp
“RTX 5070 Ti + 9800X3D running Qwen3.6-35B-A3B at 79 t/s with 128K context, the --n-cpu-moe flag is the most important part. Spent an evening dialing in Qwen3.6-35B-A3B on consumer hardware. Fun”
source: Reddit · u/marlang · 2026-04-18
communityconfidence 50%
79.00tok/s — on RTX 5070 Ti via llama.cpp
“RTX 5070 Ti + 9800X3D running Qwen3.6-35B-A3B at 79 t/s with 128K context, the --n-cpu-moe flag is the most important part. Spent an evening dialing in Qwen3.6-35B-A3B on consumer hardware. Fun”
source: Reddit · u/marlang · 2026-04-18
communityconfidence 50%
79.00tok/s — on RTX 5070 Ti via llama.cpp
“RTX 5070 Ti + 9800X3D running Qwen3.6-35B-A3B at 79 t/s with 128K context, the --n-cpu-moe flag is the most important part. Spent an evening dialing in Qwen3.6-35B-A3B on consumer hardware. Fun”
source: Reddit · u/marlang · 2026-04-18

See all 60 claims for RTX 5070 Ti →