Post

How to Add OpenAI API Compatible Models to VS Code Copilot

Configure custom OpenAI-compatible models in VS Code Copilot using the OAI Compatible Provider extension.

How to Add OpenAI API Compatible Models to VS Code Copilot

VS Code’s native OpenAI Compatible provider only works in Insiders. For stable VS Code, use the community extension “OAI Compatible Provider for Copilot” by johnny-zhao.

Install Extension

Search “OAI Compatible Provider for Copilot” in Extensions (Ctrl+Shift+X), install from johnny-zhao, and reload VS Code.

Configure Models

Add to settings.json (Ctrl+Shift+P > “Preferences: Open Settings (JSON)”):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
{
  "oaicopilot.baseUrl": "https://api.z.ai/api/paas/v4",
  "oaicopilot.models": [
    {
      "id": "glm-4.6",
      "configId": "thinking",
      "owned_by": "zai",
      "temperature": 0.7,
      "top_p": 1,
      "thinking": {
        "type": "enabled"
      }
    },
    {
      "id": "glm-4.6",
      "configId": "no-thinking",
      "owned_by": "zai",
      "temperature": 0,
      "top_p": 1,
      "thinking": {
        "type": "disabled"
      }
    }
  ]
}

This example uses Z.ai’s GLM-4.6 model. Replace with your provider’s endpoint and model ID.

Multi-Config Setup

The configId parameter creates multiple variants of the same model. Each variant appears separately in Copilot Chat:

  • thinking: Enables extended reasoning for complex problems (slower, deeper analysis)
  • no-thinking: Standard responses for quick answers (faster, direct)

Switch between variants based on task complexity without changing providers.

Activate Models

  1. Open Command Palette (Ctrl+Shift+P)
  2. Run “Chat: Manage Language Models”
  3. Select “OAI Compatible”
  4. Enter your API key from your chosen provider
  5. Check models to enable
  6. Click OK

Use in Copilot Chat

Open Copilot Chat (Ctrl+Alt+I) and select your model from the dropdown (top-right). Switch between thinking/no-thinking variants based on task complexity.

The thinking mode excels at:

  • Code architecture decisions
  • Complex debugging

The no-thinking mode handles:

  • Quick syntax questions
  • Refactoring tasks

Important Notes

  • Works with any OpenAI-compatible API (Ollama, LM Studio, OpenRouter, DeepSeek, Kimi)
  • context_length: Max input tokens (200K for GLM-4.6)
  • max_tokens: Max output tokens (132K for GLM-4.6)
  • temperature: 0 = deterministic, 1 = creative
  • thinking.type: “enabled” for reasoning models, “disabled” for speed
  • Extension updates regularly—check the GitHub repository for latest features and additional parameters

Alternative Endpoints

For Zhipu AI mainland China access:

1
2
3
{
  "oaicopilot.baseUrl": "https://open.bigmodel.cn/api/coding/paas/v4"
}

For other providers, replace baseUrl with their endpoint (e.g., http://localhost:11434/v1 for Ollama) and adjust model IDs accordingly.

☕ Support My Work

If you found this post helpful and want to support more content like this, you can buy me a coffee!

Your support helps me continue creating useful articles and tips for fellow developers. Thank you! 🙏

This post is licensed under CC BY 4.0 by the author.