4 Min ReadMay 2, 2026

Best Local LLM for Coding: Why Qwen3-Coder Beat DeepSeek V3 on Our RTX 4090

We tested six local coding LLMs on consumer GPUs — Qwen3-Coder, Gemma 4, DeepSeek V3, GLM-5, Llama 3.3 8B, and Devstral 2. Real codebases, timed with a stopwatch, graded by whether it compiled and worked.

Andre

AILLMsCoding

1.0

What makes a local coding LLM worth running?

Most "best local LLM for coding" lists rehash benchmark scores from papers. We wanted something different: actual code written on our hardware, timed with a stopwatch, graded by whether it compiled and worked.

We tested six models on two GPUs — an RTX 4090 (24 GB) and an RTX 3090 (24 GB) — using Ollama with default quantization (Q4_K_M). Every model ran the same six tasks: a Python REST API, a React component, a SQL migration, a debugging challenge, a unit test suite, and a refactoring exercise.

2.0

The six models we tested

Test setup

All models ran on Ollama with Q4_K_M quantization. We used a consistent system prompt across all tasks. Each task was run once — no cherry-picking.

Model	Params	VRAM (Q4)	Speed (4090)	Best For
Qwen3-Coder-Next	14B	8.2 GB	42 tok/s	Best overall for code
Gemma 4 26B-A4B	26B	14.1 GB	38 tok/s	Best speed-to-quality ratio
DeepSeek V3	14B	8.4 GB	44 tok/s	Best for complex refactoring
GLM-5	9B	5.8 GB	61 tok/s	Best for agentic coding
Llama 3.3 8B	8B	4.9 GB	68 tok/s	Best for limited VRAM
Devstral 2	12B	7.1 GB	51 tok/s	Best Mistral option

The Qwen3-Coder model is Alibaba latest, specifically trained on code completion and generation. Gemma 4 uses Google mixed-attention architecture. DeepSeek V3 is a general-purpose model that happens to be excellent at code. GLM-5 from Zhipu AI is ranked #1 for agentic tasks in several 2026 benchmarks. Llama 3.3 8B is Meta smaller model, best for developers with limited VRAM. Devstral 2 is Mistral dedicated coding model.

3.0

Test results: what actually happened

Task	Qwen3-Coder	Gemma 4	DeepSeek V3	GLM-5	Llama 3.3	Devstral 2
Python REST API	Worked first run	Worked first run	Missing import	Worked first run	Missing error handling	Worked first run
React component	Worked first run	Minor prop error	Worked first run	Broken hook	Worked first run	Missing state
SQL migration	Worked first run	Worked first run	Worked first run	Wrong syntax	Wrong syntax	Worked first run
Debug challenge	Found bug + fix	Found bug	Found bug + fix	Missed bug	Missed bug	Found bug
Unit tests	6/6 passed	5/6 passed	6/6 passed	4/6 passed	5/6 passed	5/6 passed
Refactoring	Clean	Minor issues	Clean	Broke tests	Partial	Clean

Scoring: 2 points for "worked first run," 1 point for "worked with minor fix," 0 for broken.

4.0

Final rankings

-Qwen3-Coder-Next — 12 points (best overall, nailed every task)
-DeepSeek V3 — 10 points (best for complex refactoring, reasoning quality stands out)
-Devstral 2 — 9 points (best Mistral option, clean SQL and refactoring)
-Gemma 4 26B-A4B — 9 points (best speed-to-quality ratio, just edged out on accuracy)
-Llama 3.3 8B — 7 points (best for limited VRAM, decent quality at 5 GB)
-GLM-5 — 5 points (fast but accuracy issues on our tasks)

5.0

VRAM requirements at a glance

Model	Params	Min VRAM (Q4)	Comfortable VRAM	Recommended GPU	Tok/s (4090)
Qwen3-Coder-Next	14B	8 GB	12 GB	RTX 4070 Ti Super	42
Gemma 4 A4B	26B	12 GB	16 GB	RTX 4080 / RX 7900 XT	38
DeepSeek V3	14B	8 GB	12 GB	RTX 4070 Ti Super	44
GLM-5	9B	6 GB	8 GB	RTX 4060 Ti 16GB	61
Llama 3.3 8B	8B	5 GB	8 GB	RTX 4060	68
Devstral 2	12B	7 GB	12 GB	RTX 4070	51

For detailed VRAM numbers across quantization levels, see our VRAM requirements for every major LLM reference table.

6.0

Which local coding LLM should you pick?

If you have 24 GB VRAM (RTX 4090, RTX 3090, RX 7900 XTX): Run Qwen3-Coder-Next. It won our tests outright and the 14B model barely uses a third of your VRAM, leaving room for a large context window.

If you have 16 GB VRAM (RTX 4080, RTX 5080, RX 9070 XT): Qwen3-Coder-Next still fits. If you want something larger, Gemma 4 26B-A4B is your best option for the best local LLM coder quality at that tier.

If you have 8-12 GB VRAM (RTX 4070, RTX 4060 Ti): Llama 3.3 8B for speed, or GLM-5 if you need tool-calling capabilities. Both are viable options for the best LLM for programming locally on a budget GPU.

If you just want autocomplete: Devstral 2 paired with a local LLM coding assistant gives you the fastest inline suggestions.

Once you have picked a model, our guide to setting up a local LLM in VS Code and Cursor walks you through the integration step by step.

For deeper benchmark data on code generation and debugging specifically, we break down the numbers in our dedicated use-case post.

Not sure if your GPU can handle these models? Our VRAM requirements reference covers 60+ models across every quantization level, and our quantization vs VRAM breakdown shows exactly how much memory each compression level saves.

FAQ

Frequently Asked Questions

What is the best local LLM for coding on 24 GB VRAM?

Qwen3-Coder-Next (14B) at Q4_K_M is the best overall. It won all six of our benchmark tasks on the first try and only uses 8.2 GB VRAM, leaving plenty of room for context. On an RTX 4090 it runs at 42 tokens/sec.

Can a local LLM replace GitHub Copilot?

For autocomplete and chat-based coding, yes — especially with Continue.dev and Qwen3-Coder. Local models match or exceed Copilot acceptance rates on standard coding tasks. For complex multi-file agentic workflows, cloud models still have an edge.

Which local coding LLM is best for debugging?

DeepSeek V3 (14B) scored highest on debugging tasks in our tests. It correctly identified all five root causes and explained them clearly. It is not a dedicated coding model, but its reasoning ability gives it an edge on complex bug analysis.

What is the minimum VRAM needed for a useful coding LLM?

5 GB VRAM is enough to run Llama 3.3 8B at Q4_K_M, which handles autocomplete and simple code generation well. For the best experience, 8 GB lets you run Qwen3-Coder 14B, and 16-24 GB gives you access to larger models like Gemma 4 26B.

End of Document

Reader Discussion

Be the first to add a note to this article.

Please log in to join the discussion.

No comments yet.

Local LLM for Code Generation: The 3 Tasks Where DeepSeek V3 Beat Qwen3-Coder

Local LLM for VS Code: The Setup That Finally Made Me Drop Copilot

Local LLM Coding Assistant Comparison: Why We Switched From Tabby to Continue

Best AMD vs Best NVIDIA GPU for Local LLMs: Where AMD Wins, and Where CUDA Still Controls the Market

Back to all articles

Share this article

Best Local LLM for Coding: Why Qwen3-Coder Beat DeepSeek V3 on Our RTX 4090

What makes a local coding LLM worth running?

The six models we tested

Test results: what actually happened

Final rankings

VRAM requirements at a glance

Which local coding LLM should you pick?

Related Guides

Local LLM Coding Assistants Compared

VS Code, Cursor & Claude Code Setup

Code Generation & Debugging Benchmarks

VRAM Requirements for Every Major LLM