Best GPU for Local LLMs Under $1,500: The Decision That Determines Your Model Limits
At $1,500, you choose between a used RTX 4090 with 24 GB and 1,008 GB/s, or a new RTX 5080 with 16 GB and 960 GB/s. One gives you more VRAM. The other gives you a warranty. The choice determines which models you can run.

The core trade-off
For models under 16 GB (7B-13B at Q4), both cards perform similarly. The 5080 has slightly newer architecture (Blackwell) and GDDR7 efficiency, but the bandwidth is comparable. The decision only matters when you hit the 16 GB ceiling: the 4090 keeps going to 24 GB, the 5080 stops. On Ollama and llama.cpp, both cards use the same CUDA backend — the question is purely VRAM capacity versus budget.
Under $1,500 comparison
| GPU | VRAM | BW | Price | Ecosystem | Warranty | Best For |
|---|---|---|---|---|---|---|
| Used RTX 4090 | 24 GB | 1,008 GB/s | ~$1,200 | CUDA | None | Best all-around LLM GPU |
| RTX 5080 | 16 GB | 960 GB/s | ~$999 | CUDA | Full | Best new card for 7B-13B |
| RX 7900 XTX | 24 GB | 960 GB/s | ~$750 | ROCm | Full | Cheapest new 24 GB |
Token speed comparison
For models that fit in both cards, speeds are within 10% of each other. The 4090 edges ahead due to slightly higher bandwidth. The real difference is which models fit at all.
| Model | RTX 4090 (24 GB) | RTX 5080 (16 GB) | RX 7900 XTX (24 GB) |
|---|---|---|---|
| Mixtral 8x7B Q4 | ~50 t/s | ~48 t/s | ~45 t/s |
| Qwen 32B Q4 | ~35 t/s | ~33 t/s | ~33 t/s |
| Llama 8B Q8 | ~100 t/s | ~95 t/s | ~90 t/s |
| Qwen 32B Q4 (~18 GB) | ~35 t/s | Does not fit | ~33 t/s |
Speeds estimated at ~70% bandwidth efficiency. Actual varies by framework and batch size.
Which should you buy?
- -Used RTX 4090: 24 GB runs everything up to 35B at Q4. Highest bandwidth at this price (1,008 GB/s). Accept used-market risk (no warranty).
- -RTX 5080: 16 GB limited to models under 16 GB. GDDR7 efficiency, newest architecture. Full warranty, lower power (360 W).
- -RX 7900 XTX: 24 GB new with warranty at $750. ROCm works for llama.cpp and Ollama. Leaves $750 budget unspent.
See RTX 5080 vs Used RTX 4090 for the deep-dive comparison, or Best GPU for Local LLMs for the full lineup. Use our VRAM Calculator to check exact memory requirements for your model and quantization.
Frequently Asked Questions
Can I get an RTX 5090 under $1,500?
Should I wait for prices to drop?
Is the RX 7900 XTX worth considering at $1,500 budget?
End of Document
Reader Discussion
Be the first to add a note to this article.
Please log in to join the discussion.
No comments yet.