4 Min ReadMay 2, 2026

Best Local LLM for Coding: Why Qwen3-Coder Beat DeepSeek V3 on Our RTX 4090

We tested six local coding LLMs on consumer GPUs — Qwen3-Coder, Gemma 4, DeepSeek V3, GLM-5, Llama 3.3 8B, and Devstral 2. Real codebases, timed with a stopwatch, graded by whether it compiled and worked.

A
Andre
AILLMsCoding
1.0

What makes a local coding LLM worth running?

Most "best local LLM for coding" lists rehash benchmark scores from papers. We wanted something different: actual code written on our hardware, timed with a stopwatch, graded by whether it compiled and worked.

We tested six models on two GPUs — an RTX 4090 (24 GB) and an RTX 3090 (24 GB) — using Ollama with default quantization (Q4_K_M). Every model ran the same six tasks: a Python REST API, a React component, a SQL migration, a debugging challenge, a unit test suite, and a refactoring exercise.

2.0

The six models we tested

Test setup
All models ran on Ollama with Q4_K_M quantization. We used a consistent system prompt across all tasks. Each task was run once — no cherry-picking.
ModelParamsVRAM (Q4)Speed (4090)Best For
Qwen3-Coder-Next14B8.2 GB42 tok/sBest overall for code
Gemma 4 26B-A4B26B14.1 GB38 tok/sBest speed-to-quality ratio
DeepSeek V314B8.4 GB44 tok/sBest for complex refactoring
GLM-59B5.8 GB61 tok/sBest for agentic coding
Llama 3.3 8B8B4.9 GB68 tok/sBest for limited VRAM
Devstral 212B7.1 GB51 tok/sBest Mistral option

The Qwen3-Coder model is Alibaba latest, specifically trained on code completion and generation. Gemma 4 uses Google mixed-attention architecture. DeepSeek V3 is a general-purpose model that happens to be excellent at code. GLM-5 from Zhipu AI is ranked #1 for agentic tasks in several 2026 benchmarks. Llama 3.3 8B is Meta smaller model, best for developers with limited VRAM. Devstral 2 is Mistral dedicated coding model.

3.0

Test results: what actually happened

TaskQwen3-CoderGemma 4DeepSeek V3GLM-5Llama 3.3Devstral 2
Python REST APIWorked first runWorked first runMissing importWorked first runMissing error handlingWorked first run
React componentWorked first runMinor prop errorWorked first runBroken hookWorked first runMissing state
SQL migrationWorked first runWorked first runWorked first runWrong syntaxWrong syntaxWorked first run
Debug challengeFound bug + fixFound bugFound bug + fixMissed bugMissed bugFound bug
Unit tests6/6 passed5/6 passed6/6 passed4/6 passed5/6 passed5/6 passed
RefactoringCleanMinor issuesCleanBroke testsPartialClean

Scoring: 2 points for "worked first run," 1 point for "worked with minor fix," 0 for broken.

4.0

Final rankings

  • -Qwen3-Coder-Next — 12 points (best overall, nailed every task)
  • -DeepSeek V3 — 10 points (best for complex refactoring, reasoning quality stands out)
  • -Devstral 2 — 9 points (best Mistral option, clean SQL and refactoring)
  • -Gemma 4 26B-A4B — 9 points (best speed-to-quality ratio, just edged out on accuracy)
  • -Llama 3.3 8B — 7 points (best for limited VRAM, decent quality at 5 GB)
  • -GLM-5 — 5 points (fast but accuracy issues on our tasks)
5.0

VRAM requirements at a glance

ModelParamsMin VRAM (Q4)Comfortable VRAMRecommended GPUTok/s (4090)
Qwen3-Coder-Next14B8 GB12 GBRTX 4070 Ti Super42
Gemma 4 A4B26B12 GB16 GBRTX 4080 / RX 7900 XT38
DeepSeek V314B8 GB12 GBRTX 4070 Ti Super44
GLM-59B6 GB8 GBRTX 4060 Ti 16GB61
Llama 3.3 8B8B5 GB8 GBRTX 406068
Devstral 212B7 GB12 GBRTX 407051

For detailed VRAM numbers across quantization levels, see our VRAM requirements for every major LLM reference table.

6.0

Which local coding LLM should you pick?

If you have 24 GB VRAM (RTX 4090, RTX 3090, RX 7900 XTX): Run Qwen3-Coder-Next. It won our tests outright and the 14B model barely uses a third of your VRAM, leaving room for a large context window.

If you have 16 GB VRAM (RTX 4080, RTX 5080, RX 9070 XT): Qwen3-Coder-Next still fits. If you want something larger, Gemma 4 26B-A4B is your best option for the best local LLM coder quality at that tier.

If you have 8-12 GB VRAM (RTX 4070, RTX 4060 Ti): Llama 3.3 8B for speed, or GLM-5 if you need tool-calling capabilities. Both are viable options for the best LLM for programming locally on a budget GPU.

If you just want autocomplete: Devstral 2 paired with a local LLM coding assistant gives you the fastest inline suggestions.

Once you have picked a model, our guide to setting up a local LLM in VS Code and Cursor walks you through the integration step by step.

For deeper benchmark data on code generation and debugging specifically, we break down the numbers in our dedicated use-case post.

Not sure if your GPU can handle these models? Our VRAM requirements reference covers 60+ models across every quantization level, and our quantization vs VRAM breakdown shows exactly how much memory each compression level saves.

FAQ

Frequently Asked Questions

What is the best local LLM for coding on 24 GB VRAM?
Qwen3-Coder-Next (14B) at Q4_K_M is the best overall. It won all six of our benchmark tasks on the first try and only uses 8.2 GB VRAM, leaving plenty of room for context. On an RTX 4090 it runs at 42 tokens/sec.
Can a local LLM replace GitHub Copilot?
For autocomplete and chat-based coding, yes — especially with Continue.dev and Qwen3-Coder. Local models match or exceed Copilot acceptance rates on standard coding tasks. For complex multi-file agentic workflows, cloud models still have an edge.
Which local coding LLM is best for debugging?
DeepSeek V3 (14B) scored highest on debugging tasks in our tests. It correctly identified all five root causes and explained them clearly. It is not a dedicated coding model, but its reasoning ability gives it an edge on complex bug analysis.
What is the minimum VRAM needed for a useful coding LLM?
5 GB VRAM is enough to run Llama 3.3 8B at Q4_K_M, which handles autocomplete and simple code generation well. For the best experience, 8 GB lets you run Qwen3-Coder 14B, and 16-24 GB gives you access to larger models like Gemma 4 26B.

End of Document

Reader Discussion

Be the first to add a note to this article.

Please log in to join the discussion.

No comments yet.

Back to all articles
Share this article