Skip to content
AzulTech logo AzulTech
Go back

BYOK in GitHub Copilot: Your Escape Hatch from Usage-Based Billing

On April 27, GitHub announced that all Copilot plans will transition to usage-based billing on June 1, 2026. Premium request units are being replaced by GitHub AI Credits, consumed based on token usage. For teams running heavy agentic workflows — multi-step coding sessions, plan agents, autonomous iterations across entire repos — this could mean unpredictable cost spikes.

But there’s a lever most teams are ignoring: Bring Your Own Key (BYOK).

Table of contents

Open Table of contents

What’s Changing with Copilot Billing

Here’s the short version: starting June 1, every Copilot interaction (except code completions and Next Edit suggestions) will consume AI Credits based on actual token usage — input, output, and cached tokens — at the published API rates for each model.

The plan prices themselves aren’t changing:

PlanPriceIncluded Credits
Copilot Pro$10/month$10 in AI Credits
Copilot Pro+$39/month$39 in AI Credits
Copilot Business$19/user/month$19 in AI Credits
Copilot Enterprise$39/user/month$39 in AI Credits

Business and Enterprise customers get promotional credits through August ($30 and $70 respectively), but after that, you’re on the meter. When your pool is empty, the org admin decides: buy more credits or cap usage.

The problem isn’t the per-seat price. It’s that agentic usage is expensive — a multi-hour coding session with a frontier model burns through tokens at a completely different rate than a quick chat question. And under the new model, both hit the same credit pool.

Enter BYOK: Bring Your Own Key

BYOK has been available for Copilot Business and Enterprise since January 2026, and became generally available in VS Code in April 2026.

The concept is straightforward: instead of using GitHub’s built-in model routing (which consumes your AI Credits), you plug in your own API key from a supported provider. Usage goes through that provider’s billing, not through GitHub’s credit system.

Once configured, BYOK models work everywhere in VS Code Chat — including the built-in plan agent and custom agents. The one exception: code completions still go through Copilot’s native pipeline.

Supported Providers

As of the January 2026 enhancements, BYOK supports a wide range of providers:

The January update also brought support for the Responses API (enabling structured outputs), configurable maximum context windows (to control costs), and streaming responses for faster interaction.

Why BYOK Matters for TBB

Here’s where it gets interesting. Under token-based billing (TBB), your heaviest Copilot users — the ones running agentic workflows, iterating on complex problems, using frontier models — are the ones who’ll blow through AI Credits fastest.

BYOK lets you route exactly those workloads off GitHub’s billing:

  1. Predictable costs: You negotiate your own rates with providers. Many teams already have enterprise agreements with OpenAI, Azure, or Anthropic that include volume discounts.

  2. Model flexibility: Use the best model for the job. Run Claude for code review, GPT-4o for planning, and a local Ollama model for quick iterations — all within Copilot’s UI.

  3. No credit consumption: BYOK usage doesn’t count against your GitHub AI Credits. Your included credits stay available for code completions and lightweight chat.

  4. Budget control on your terms: Instead of GitHub’s credit pool mechanics, you manage spend through your cloud provider’s existing cost controls, budgets, and alerts.

  5. Local models for free: Connect Ollama or Foundry Local for development and experimentation at zero marginal cost. No tokens leave your machine.

The Practical Setup

The policy is enabled by default for Business and Enterprise orgs. Individual users can add models from built-in providers or install language model provider extensions from the VS Code marketplace.

For teams, the recommended approach:

  1. Keep Copilot Business/Enterprise for code completions and base functionality — those are included and don’t consume credits.
  2. Route chat and agentic workloads through BYOK using your existing API keys (Azure OpenAI, Anthropic, etc.).
  3. Set context window limits on BYOK models to control per-request costs.
  4. Use local models (Ollama/Foundry Local) for development iteration where latency matters more than capability.

An org admin can also disable BYOK if needed, through the “Bring Your Own Language Model Key in VS Code” policy in Copilot settings.

The Math

Let’s say you have a team of 10 on Copilot Business ($19/user/month = $190/month total, $190 in pooled AI Credits). Under heavy agentic usage, those credits might last 2–3 weeks.

With BYOK, you route the expensive chat and agent interactions through your Azure OpenAI enterprise agreement. Now your AI Credits last the full month (or longer), and your Azure costs are governed by existing budgets and commitments you’ve already negotiated.

The net effect: Copilot stays the UI and orchestration layer, but inference costs shift to where you have the most leverage.

What BYOK Doesn’t Cover

A few limitations to be aware of:

Bottom Line

GitHub’s move to usage-based billing isn’t inherently bad — it aligns cost with value and removes the need to gate heavy users. But for teams that are already deep into agentic workflows, the credit allotments may not be enough.

BYOK gives you an escape valve: keep Copilot for what it does best (completions, UI, orchestration), and bring your own inference for the expensive stuff. If your organization already has API agreements with the major model providers, there’s almost no reason not to set this up before June 1.

The teams that figure out this split — GitHub for the platform, BYOK for the inference — will get the best of both worlds: Copilot’s developer experience without the billing surprises.


Links:


Share this post on: