Skip to content
llm-speed
Leaderboard/hardware/rtx-5070-ti

RTX 5070 Ti — LLM benchmarks

No benchmarks on RTX 5070 Ti yet.

No RTX 5070 Ti benchmarks yet.

Run on YOUR hardware to populate this page: pipx install llm-speed && llm-speed bench

$ pipx install llm-speed && llm-speed bench

Community folklore on RTX 5070 Ti

60 unverified claims extracted from Reddit/HN comments. Lower trust than signed runs above — every row links to the source.

  • communityconfidence 75%

    42.00tok/s Qwen3-30B-A3B on RTX 5070 Ti via ollama IQ4_XS

    (hardware is a RTX 5070Ti 16GB GPU dedicated to this). I was going to test Qwen3-30B-A3B-Instruct-2507-GGUF:IQ4_XS which I can get about 42 tok/s using a RTX 5070Ti.

    source: Reddit · u/StartupTim · 2025-08-03

  • communityconfidence 55%

    43.88tok/s Qwen3-Coder-Next on RTX 5070 Ti IQ4_XS

    n3-Coder-Next-IQ4_XS` For scaffolding a small Rust Project this gives me: - Prompt Processing: `154.55 tokens/s` - Token Generation: `43.88 tokens/s` I have been previously using llama with -ngl 12-14 (depending on how much free VRAM I have), which gave me only around 14 t…

    source: Reddit · u/SalariedSlave · 2026-02-05

  • communityconfidence 55%

    154.6tok/s Qwen3-Coder-Next on RTX 5070 Ti IQ4_XS

    - RTX 5070 Ti 16GB I'm running `Qwen3-Coder-Next-IQ4_XS` For scaffolding a small Rust Project this gives me: - Prompt Processing: `154.55 tokens/s` - Token Generation: `43.88 tokens/s` I have been previously using llama with -ngl 12-14 (depending on how much free VRAM I h…

    source: Reddit · u/SalariedSlave · 2026-02-05

  • communityconfidence 55%

    43.88tok/s Qwen3-Coder-Next on RTX 5070 Ti IQ4_XS

    n3-Coder-Next-IQ4_XS` For scaffolding a small Rust Project this gives me: - Prompt Processing: `154.55 tokens/s` - Token Generation: `43.88 tokens/s` I have been previously using llama with -ngl 12-14 (depending on how much free VRAM I have), which gave me only around 14 t…

    source: Reddit · u/SalariedSlave · 2026-02-05

  • communityconfidence 55%

    154.6tok/s Qwen3-Coder-Next on RTX 5070 Ti IQ4_XS

    - RTX 5070 Ti 16GB I'm running `Qwen3-Coder-Next-IQ4_XS` For scaffolding a small Rust Project this gives me: - Prompt Processing: `154.55 tokens/s` - Token Generation: `43.88 tokens/s` I have been previously using llama with -ngl 12-14 (depending on how much free VRAM I h…

    source: Reddit · u/SalariedSlave · 2026-02-05

  • communityconfidence 50%

    79.00tok/s on RTX 5070 Ti via llama.cpp

    RTX 5070 Ti + 9800X3D running Qwen3.6-35B-A3B at 79 t/s with 128K context, the --n-cpu-moe flag is the most important part. Spent an evening dialing in Qwen3.6-35B-A3B on consumer hardware. Fun

    source: Reddit · u/marlang · 2026-04-18

  • communityconfidence 50%

    79.00tok/s on RTX 5070 Ti via llama.cpp

    RTX 5070 Ti + 9800X3D running Qwen3.6-35B-A3B at 79 t/s with 128K context, the --n-cpu-moe flag is the most important part. Spent an evening dialing in Qwen3.6-35B-A3B on consumer hardware. Fun

    source: Reddit · u/marlang · 2026-04-18

  • communityconfidence 50%

    79.00tok/s on RTX 5070 Ti via llama.cpp

    RTX 5070 Ti + 9800X3D running Qwen3.6-35B-A3B at 79 t/s with 128K context, the --n-cpu-moe flag is the most important part. Spent an evening dialing in Qwen3.6-35B-A3B on consumer hardware. Fun

    source: Reddit · u/marlang · 2026-04-18

  • communityconfidence 50%

    79.00tok/s on RTX 5070 Ti via llama.cpp

    RTX 5070 Ti + 9800X3D running Qwen3.6-35B-A3B at 79 t/s with 128K context, the --n-cpu-moe flag is the most important part. Spent an evening dialing in Qwen3.6-35B-A3B on consumer hardware. Fun

    source: Reddit · u/marlang · 2026-04-18

  • communityconfidence 50%

    79.00tok/s on RTX 5070 Ti via llama.cpp

    RTX 5070 Ti + 9800X3D running Qwen3.6-35B-A3B at 79 t/s with 128K context, the --n-cpu-moe flag is the most important part. Spent an evening dialing in Qwen3.6-35B-A3B on consumer hardware. Fun

    source: Reddit · u/marlang · 2026-04-18

See all 60 claims for RTX 5070 Ti