Local LLM for VS Code: The Setup That Finally Made Me Drop Copilot
Step-by-step instructions for running a local LLM inside VS Code with Continue, connecting Cursor to a local model via BYOK, and using Claude Code with a local API proxy. Tested on Windows, Mac, and Linux.
Prerequisites — pick your model runner
Before connecting a local LLM for VS Code, you need a model server running on your machine. Three options:
- -Ollama — The fastest path. One command to install, one command to pull a model. Works on Mac, Linux, and Windows (native). This is what we recommend for most developers.
- -LM Studio — Best GUI experience. Download models through a searchable interface, tweak parameters with sliders, see real-time performance metrics.
- -llama.cpp server — Maximum control and performance. Build from source, tune every parameter. Overkill for most people.
For this guide, we use Ollama. Install it from ollama.com, then pull a coding model:
This downloads the Qwen3-Coder model (about 8 GB). For the best local LLM for VS Code experience on a 24 GB GPU, you could also try:
See our best local LLM for coding guide for model recommendations by GPU.
Local LLM in VS Code (with Continue extension)
Continue is the most popular open-source VS Code local LLM extension, with over 500K installs. It supports autocomplete, chat, and agentic mode — all pointing to your local model.
ollama run qwen3-coder "Write a Python hello world"You should see a response in your terminal.
Step 2: Open VS Code, go to Extensions (Ctrl+Shift+X), search for "Continue", and install it.
Step 3: When you first open Continue, select Ollama as your provider and choose your model. For manual configuration, the config file lives at ~/.continue/config.json:
This gives you both chat (Cmd+L) and tab autocomplete from the same model.
Step 4: If autocomplete feels slow, try a smaller model for tab completion and keep the larger model for chat:
The 1.5B model responds in under 50ms — almost indistinguishable from Copilot. You can also check our local LLM coding assistant comparison for tool-specific tuning tips.
Local LLM in Cursor (workarounds)
As of May 2026, Cursor does not natively support direct localhost connections to local models. All BYOK (Bring Your Own Key) requests route through Cursor servers. However, there are two workarounds.
http://localhost:11434/v1This works for Ollama default port. Select your model in the model dropdown. Some Cursor features (agent mode, apply) may not work perfectly with local models.
Method 2: ngrok tunnel. If Cursor rejects localhost URLs, create a tunnel:
Copy the ngrok URL (e.g. https://abc123.ngrok.io) and use that as your Base URL.
Limitations: Cursor agent mode sometimes struggles with local model responses. Inline edits work better than multi-file agent operations. The Cursor local LLM experience is improving but still behind VS Code + Continue.
Local LLM with Claude Code
Claude Code is Anthropic CLI-based coding tool. It does not support local models natively, but you can use a local API proxy to intercept and redirect requests.
The most common approach: run an OpenAI-compatible proxy (like LiteLLM) that forwards to your local Ollama instance, then configure Claude Code to use that proxy endpoint.
Performance tips
- -GPU offloading — Make sure Ollama is using your GPU, not CPU. Run nvidia-smi while generating to confirm GPU utilization. On Mac, Ollama uses Metal automatically.
- -Quantization — Q4_K_M is the sweet spot for coding tasks. Lower (Q2, Q3) saves VRAM but hurts code accuracy. Higher (Q6, Q8) improves quality marginally while doubling VRAM usage.
- -Context window — Coding benefits from large context. Set num_ctx: 8192 or higher in your Ollama Modelfile. This uses more VRAM.
For the exact memory cost of larger contexts, see our KV cache explained post. For VRAM savings at each quantization level, our quantization vs VRAM guide has the numbers.
For more on which model pairs best with your hardware, our best local LLM for coding rankings cover VRAM requirements and speed benchmarks.
Frequently Asked Questions
Can I use a local LLM in VS Code?
Does Cursor support local LLMs?
Which is better for local LLM coding: VS Code or Cursor?
End of Document
Reader Discussion
Be the first to add a note to this article.
Please log in to join the discussion.
No comments yet.