How to Add OpenAI API Compatible Models to VS Code Copilot
Configure custom OpenAI-compatible models in VS Code Copilot using the OAI Compatible Provider extension.
VS Code’s native OpenAI Compatible provider only works in Insiders. For stable VS Code, use the community extension “OAI Compatible Provider for Copilot” by johnny-zhao.
Install Extension
Search “OAI Compatible Provider for Copilot” in Extensions (Ctrl+Shift+X), install from johnny-zhao, and reload VS Code.
Configure Models
Add to settings.json (Ctrl+Shift+P > “Preferences: Open Settings (JSON)”):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
{
"oaicopilot.baseUrl": "https://api.z.ai/api/paas/v4",
"oaicopilot.models": [
{
"id": "glm-4.6",
"configId": "thinking",
"owned_by": "zai",
"temperature": 0.7,
"top_p": 1,
"thinking": {
"type": "enabled"
}
},
{
"id": "glm-4.6",
"configId": "no-thinking",
"owned_by": "zai",
"temperature": 0,
"top_p": 1,
"thinking": {
"type": "disabled"
}
}
]
}
This example uses Z.ai’s GLM-4.6 model. Replace with your provider’s endpoint and model ID.
Multi-Config Setup
The configId parameter creates multiple variants of the same model. Each variant appears separately in Copilot Chat:
thinking: Enables extended reasoning for complex problems (slower, deeper analysis)no-thinking: Standard responses for quick answers (faster, direct)
Switch between variants based on task complexity without changing providers.
Activate Models
- Open Command Palette (Ctrl+Shift+P)
- Run “Chat: Manage Language Models”
- Select “OAI Compatible”
- Enter your API key from your chosen provider
- Check models to enable
- Click OK
Use in Copilot Chat
Open Copilot Chat (Ctrl+Alt+I) and select your model from the dropdown (top-right). Switch between thinking/no-thinking variants based on task complexity.
The thinking mode excels at:
- Code architecture decisions
- Complex debugging
The no-thinking mode handles:
- Quick syntax questions
- Refactoring tasks
Important Notes
- Works with any OpenAI-compatible API (Ollama, LM Studio, OpenRouter, DeepSeek, Kimi)
context_length: Max input tokens (200K for GLM-4.6)max_tokens: Max output tokens (132K for GLM-4.6)temperature: 0 = deterministic, 1 = creativethinking.type: “enabled” for reasoning models, “disabled” for speed- Extension updates regularly—check the GitHub repository for latest features and additional parameters
Alternative Endpoints
For Zhipu AI mainland China access:
1
2
3
{
"oaicopilot.baseUrl": "https://open.bigmodel.cn/api/coding/paas/v4"
}
For other providers, replace baseUrl with their endpoint (e.g., http://localhost:11434/v1 for Ollama) and adjust model IDs accordingly.
☕ Support My Work
If you found this post helpful and want to support more content like this, you can buy me a coffee!
Your support helps me continue creating useful articles and tips for fellow developers. Thank you! 🙏