How to Add OpenAI API Compatible Models to VS Code Copilot

Configure custom OpenAI-compatible models in VS Code Copilot using the OAI Compatible Provider extension.

Posted Oct 17, 2025 Updated Oct 30, 2025

By Mehmet Baykar

2 min read

VS Code’s native OpenAI Compatible provider only works in Insiders. For stable VS Code, use the community extension “OAI Compatible Provider for Copilot” by johnny-zhao.

Install Extension

Search “OAI Compatible Provider for Copilot” in Extensions (Ctrl+Shift+X), install from johnny-zhao, and reload VS Code.

Configure Models

Add to settings.json (Ctrl+Shift+P > “Preferences: Open Settings (JSON)”):

  
{
  "oaicopilot.baseUrl": "https://api.z.ai/api/paas/v4",
  "oaicopilot.models": [
    {
      "id": "glm-4.6",
      "configId": "thinking",
      "owned_by": "zai",
      "temperature": 0.7,
      "top_p": 1,
      "thinking": {
        "type": "enabled"
      }
    },
    {
      "id": "glm-4.6",
      "configId": "no-thinking",
      "owned_by": "zai",
      "temperature": 0,
      "top_p": 1,
      "thinking": {
        "type": "disabled"
      }
    }
  ]
}

This example uses Z.ai’s GLM-4.6 model. Replace with your provider’s endpoint and model ID.

Multi-Config Setup

The configId parameter creates multiple variants of the same model. Each variant appears separately in Copilot Chat:

thinking: Enables extended reasoning for complex problems (slower, deeper analysis)
no-thinking: Standard responses for quick answers (faster, direct)

Switch between variants based on task complexity without changing providers.

Activate Models

Open Command Palette (Ctrl+Shift+P)
Run “Chat: Manage Language Models”
Select “OAI Compatible”
Enter your API key from your chosen provider
Check models to enable
Click OK

Use in Copilot Chat

Open Copilot Chat (Ctrl+Alt+I) and select your model from the dropdown (top-right). Switch between thinking/no-thinking variants based on task complexity.

The thinking mode excels at:

Code architecture decisions
Complex debugging

The no-thinking mode handles:

Quick syntax questions
Refactoring tasks

Important Notes

Works with any OpenAI-compatible API (Ollama, LM Studio, OpenRouter, DeepSeek, Kimi)
context_length: Max input tokens (200K for GLM-4.6)
max_tokens: Max output tokens (132K for GLM-4.6)
temperature: 0 = deterministic, 1 = creative
thinking.type: “enabled” for reasoning models, “disabled” for speed
Extension updates regularly—check the GitHub repository for latest features and additional parameters

Alternative Endpoints

For Zhipu AI mainland China access:

  
{
  "oaicopilot.baseUrl": "https://open.bigmodel.cn/api/coding/paas/v4"
}

For other providers, replace baseUrl with their endpoint (e.g., http://localhost:11434/v1 for Ollama) and adjust model IDs accordingly.

☕ Support My Work

If you found this post helpful and want to support more content like this, you can buy me a coffee!

Your support helps me continue creating useful articles and tips for fellow developers. Thank you! 🙏

Developer, Tools

vscode copilot openai ai llm extensions productivity configuration

This post is licensed under CC BY 4.0 by the author.