RTX 3090 vs RTX 4090 for local LLMs: is 2x the price worth it?

Published 2026-07-02

The RTX 3090 and RTX 4090 are both popular picks for running LLMs at home, but the 4090 costs roughly twice as much. So how much faster is it, really? We measured both under the same suite (Ollama, identical models), so this is a fair, apples-to-apples comparison. Every number links to the signed run. (The 4090 we rented in the cloud reports 48 GB, a modded variant; its decode speed is representative of the 4090 chip.)

Head to head (decode tok/s, same models)

Qwen2.5-Coder-7B: RTX 3090 139 vs RTX 4090 161 tok/s (+16%).
Llama-3.1-8B: 3090 136 vs 4090 154 tok/s (+13%).
Qwen2.5-Coder-14B: 3090 69 vs 4090 90 tok/s (+29%).

The 4090 is about 15 to 30% faster, and the gap widens on the larger model where its extra memory bandwidth and compute show. Full standings: /hw/rtx-3090, /hw/rtx-4090, and the cheatsheet.

Is it worth ~2x the price?

A used 3090 runs about $700; a 4090 about $1,600. In the cloud it is $0.22 vs $0.34/hr. So the 3090 gives clearly more tok/s per dollar, while the 4090 gives more tok/s outright. For most people the 3090 is the value pick; choose the 4090 if you want the fastest single 24 GB card, or headroom for larger models and longer context.

Both handle coders well

Either card runs 7 to 14B coders comfortably, and a small mixture-of-experts flies on both: the 3090 alone hits 189 tok/s on DeepSeek-Coder-V2-16B. More on that in the 3090 deep-dive.

Reproduce it

Every number is one command from reproduction:

$ pipx install llm-speed && llm-speed bench

Numbers as of July 2026; the linked runs and the cheatsheet always reflect current data.