A used RTX 4090 is arguably the smartest buy for local LLMs right now. You get 24 GB of GDDR6X at 1,008 GB/s bandwidth, full CUDA support, and Ada Lovelace features like FP8 and Flash Attention 2 - all at a significant discount from the new price. The 4090 was the top-tier GPU just one generation ago, and for inference workloads it is still exceptionally capable.
The 24 GB VRAM is the key advantage over a new RTX 5080. You can run Llama 3.1 70B at 4-bit quantization (roughly 38 GB) with partial CPU offloading, or run it entirely on GPU at 3-bit quantization. Models like Command R (35B), Qwen 2.5 32B, and Mixtral 8x7B fit entirely in VRAM. That flexibility is worth the used-market risk for many builders.
Bandwidth at 1,008 GB/s is actually higher than the RTX 5080's 960 GB/s, which means the 4090 generates tokens faster for models that fit in 24 GB. The extra bandwidth matters because inference on large models is memory-bound - the GPU spends most of its time moving weights from VRAM to the compute units.
The risks of buying used are real: no warranty, potential thermal paste degradation, and the small chance of a card that was run hard for crypto mining. Buy from sellers with good reputations, test the card under sustained load before committing, and verify all VRAM is error-free using GPU stress tests. At the right price, a used 4090 is the best value in local LLM hardware.
Why It Wins
- -1,008 GB/s bandwidth - faster than the new RTX 5080
- -24 GB VRAM opens up 70B-class models
- -Full CUDA + FP8 + Flash Attention support
- -Significant discount over buying new
Skip If
- -No warranty on used cards
- -450 W TDP needs a strong PSU and good cooling
- -Risk of degraded hardware from mining or heavy use




