GitHub Copilot SDK 入门：五分钟构建你的第一个 AI AgentGitHub Copilot SDK

TL;DR

The core value of the GitHub Copilot SDK is not the convenience of "calling an LLM" (that's already been solved by the OpenAI SDK, LangChain, etc.), but rather providing a production-proven Agent runtime.

The problems it actually solves are:

Orchestration complexity: Planner, tool routing, and state management are built-in
Stability: Reliability guaranteed by millions of developers using it daily
Evolvability: New models and tool capabilities are automatically updated by the CLI

When you start building your next AI application, ask yourself two questions:

Where is my core value? If it's in business logic and tool definitions, use the SDK; if it's in low-level orchestration innovation, build your own framework.
How fast do you need to reach production? The SDK lets you skip 80% of the infrastructure work and focus on the last 20% of differentiated capability.

The barrier to Agent development has dropped, but the real challenge is: defining valuable tools, designing smooth interactions, and solving real problems. Technology is no longer the bottleneck — imagination is.

Introduction: Why Agent Development Is No Longer Just for Experts

In January 2026, GitHub released the Copilot SDK, marking a pivotal shift in AI Agent development from "expert territory" to "mainstream tooling."

Before this, building an AI Agent capable of autonomous planning, tool invocation, and file editing required you to:

Choose and integrate an LLM service (OpenAI, Anthropic, Azure...)
Build your own Agent orchestrator (planner, tool routing, state management)
Handle streaming output, error retries, and context management
Implement tool definition standards (function calling schema)

This process was complex and fragile. Open-source frameworks (LangChain, AutoGPT) lowered the barrier, but still required deep understanding of Agent runtime mechanics. The real turning point: GitHub opened up the production-grade Agent runtime from Copilot CLI as an SDK.

What does this mean? You can launch a complete Agent runtime with just 5 lines of code:

import asyncio
from copilot import CopilotClient

async def main():
    client = CopilotClient()
    await client.start()
    session = await client.create_session({"model": "gpt-4.1"})
    response = await session.send_and_wait({"prompt": "Explain quantum entanglement"})
    print(response.data.content)

asyncio.run(main())

No need to worry about model integration, prompt engineering, or response parsing — all of this has been battle-tested by Copilot CLI across millions of developers. You only need to define business logic; the SDK handles everything else.

Goal of this article: Through a complete weather assistant example, help you understand:

How the SDK communicates with the CLI (the architectural essence)
How the tool invocation mechanism works (how the LLM "decides" to call your code)
The key leap from toy to tool (streaming responses, event listening, state management)

Whether you want to quickly validate an AI application idea or build a customized Agent for your enterprise, this article is the starting point.

Prerequisites: Setting Up Your Environment

Before writing any code, make sure your development environment meets the following requirements.

Prerequisites Checklist

1. Install the GitHub Copilot CLI

The SDK itself does not contain AI inference capabilities — it communicates with the Copilot CLI via JSON-RPC. The CLI is the real "engine"; the SDK is the "steering wheel."

# macOS/Linux
brew install copilot-cli

# Verify installation
copilot --version

2. Authenticate Your GitHub Account

copilot login

You need a GitHub Copilot subscription (individual or enterprise). If using BYOK (Bring Your Own Key) mode, you can skip this step.

Verify the Environment

Run the following command to confirm the CLI is working:

copilot -p "Explain recursion in one sentence"

If you see an AI response, the environment is ready.

Step 1: Send Your First Message

Install the SDK

Create a project directory and install the Python SDK:

mkdir copilot-demo && cd copilot-demo
# working in virtual env
python -m venv venv && source venv/bin/activate
pip install github-copilot-sdk

Minimal Code Example

Create main.py:

import asyncio
from copilot import CopilotClient

async def main():
    client = CopilotClient()
    await client.start()
    
    session = await client.create_session({"model": "gpt-4.1"})
    response = await session.send_and_wait({"prompt": "What is quantum entanglement?"})
    
    print(response.data.content)
    
    await client.stop()

asyncio.run(main())

Run it:

python main.py

You'll see the AI's complete response. With just 9 lines of code, a complete AI conversation is done.

Execution Flow Breakdown

What happens behind this code?

1. client.start()     → SDK launches the Copilot CLI process (runs in the background)
2. create_session()   → Requests the CLI to create a session via JSON-RPC
3. send_and_wait()    → Sends the prompt; the CLI forwards it to the LLM
4. LLM inference      → Response is returned to the SDK through the CLI
5. response.data      → SDK parses the JSON response and extracts the content

The Architectural Essence: The SDK Is the CLI's "Remote Control"

GitHub's design philosophy is separation of concerns:

Component	Responsibility
Copilot CLI	Agent runtime (planning, tool invocation, LLM communication)
SDK	Process management, JSON-RPC wrapper, event listening
Your code	Business logic and tool definitions

Advantages of this architecture:

Independent CLI upgrades: New models and tool capabilities don't require SDK changes
Low multi-language support cost: Each language SDK only needs to implement a JSON-RPC client
Debug-friendly: The CLI can run independently, making it easy to observe logs and troubleshoot

JSON-RPC Communication Example

When you call send_and_wait(), the actual request the SDK sends:

{
  "jsonrpc": "2.0",
  "method": "session.send",
  "params": {
    "sessionId": "abc123",
    "prompt": "What is quantum entanglement?"
  },
  "id": 1
}

CLI response:

{
  "jsonrpc": "2.0",
  "result": {
    "data": {
      "content": "Quantum entanglement refers to a phenomenon where two or more quantum systems..."
    }
  },
  "id": 1
}

Understanding this is important: The SDK is not "calling an LLM" — it's "calling the CLI." The CLI has already encapsulated all the complexity.

Step 2: Real-Time AI Responses — Streaming Output

Why Streaming Responses Are Needed

When using send_and_wait(), you must wait for the LLM to generate a complete response before seeing any output. For long-form generation (such as code explanations or documentation), users might stare at a blank screen for 10–30 seconds.

Streaming responses let the AI output text word by word, like a typewriter — improving user experience while also allowing you to catch early signs that the model is going off track.

Event Listening Mechanism

Modify main.py to enable streaming output:

import asyncio
import sys
from copilot import CopilotClient
from copilot.generated.session_events import SessionEventType

async def main():
    client = CopilotClient()
    await client.start()
    
    session = await client.create_session({
        "model": "gpt-4.1",
        "streaming": True,  # Enable streaming mode
    })
    
    # Listen for response deltas
    def handle_event(event):
        if event.type == SessionEventType.ASSISTANT_MESSAGE_DELTA:
            sys.stdout.write(event.data.delta_content)
            sys.stdout.flush()
        if event.type == SessionEventType.SESSION_IDLE:
            print()  # Newline when complete
    
    session.on(handle_event)
    
    await session.send_and_wait({"prompt": "Write a code example of quicksort"})

    await client.stop()

asyncio.run(main())

After running it, you'll see results gradually "stream in" rather than appearing all at once.

The Design Philosophy of the Event-Driven Model

The SDK uses the Observer pattern to handle the asynchronous event stream from the CLI:

CLI generates events → SDK parses → Dispatches to listeners → Your handle_event() executes

Main event types:

Event	Triggered When	Typical Use
`ASSISTANT_MESSAGE_DELTA`	AI generates partial content	Real-time display
`ASSISTANT_MESSAGE`	AI completes a full message	Get final content
`SESSION_IDLE`	Session enters idle state	Mark task complete
`TOOL_CALL`	AI decides to invoke a tool	Logging, auth check

Code Comparison: Synchronous vs. Streaming

Synchronous mode — suitable for short responses:

response = await session.send_and_wait({"prompt": "1+1=?"})
print(response.data.content)  # Wait and print all at once

Streaming mode — suitable for long-form content:

session.on(lambda event: 
    print(event.data.delta_content, end="") 
    if event.type == SessionEventType.ASSISTANT_MESSAGE_DELTA 
    else None
)
await session.send_and_wait({"prompt": "Write an article"})

Technical Details Under the Hood

Streaming responses are based on Server-Sent Events (SSE) or WebSocket:

The CLI receives a token stream from the LLM
For each token received, the CLI sends a message_delta event to the SDK
The SDK triggers your event listener
The user immediately sees new content

This design lets your application perceive the AI's "thinking process", not just the final result.

Step 3: Giving the AI Capabilities — Custom Tools

The Essence of Tools: Letting the LLM Call Your Code

Up to now, the AI can only "talk" — it cannot interact with the outside world. Tools are the core capability of an Agent: you define functions, and the AI decides when to call them.

For example:

User: "What's the weather in Beijing today?"
AI thinks: I need weather data → call get_weather("Beijing")
Your code: returns {"temperature": "15°C", "condition": "sunny"}
AI synthesizes: "Beijing is sunny today, 15°C."

Key point: The AI autonomously decides whether to call a tool and what parameters to pass.

Three Elements of a Tool Definition

A tool contains:

Description: Tells the AI what this tool does
Parameter schema: Defines the structure of input parameters (using Pydantic)
Handler: The Python function that actually executes

Complete Weather Assistant Example

Create weather_assistant.py:

import asyncio
import random
import sys
from copilot import CopilotClient
from copilot.tools import define_tool
from copilot.generated.session_events import SessionEventType
from pydantic import BaseModel, Field

# 1. Define parameter schema
class GetWeatherParams(BaseModel):
    city: str = Field(description="City name, e.g., Beijing, Shanghai")

# 2. Define tool (description + handler)
@define_tool(description="Get current weather for a specified city")
async def get_weather(params: GetWeatherParams) -> dict:
    city = params.city
    
    # In production, call a real weather API here
    # Using mock data for demonstration
    conditions = ["sunny", "cloudy", "rainy", "overcast"]
    temp = random.randint(10, 30)
    condition = random.choice(conditions)
    
    return {
        "city": city,
        "temperature": f"{temp}°C",
        "condition": condition
    }

async def main():
    client = CopilotClient()
    await client.start()
    
    # 3. Pass tools to session
    session = await client.create_session({
        "model": "gpt-4.1",
        "streaming": True,
        "tools": [get_weather],  # Register tool
    })
    
    # Listen for streaming responses
    def handle_event(event):
        if event.type == SessionEventType.ASSISTANT_MESSAGE_DELTA:
            sys.stdout.write(event.data.delta_content)
            sys.stdout.flush()
        if event.type == SessionEventType.SESSION_IDLE:
            print("\n")
    
    session.on(handle_event)
    
    # Send a prompt that requires tool calls
    await session.send_and_wait({
        "prompt": "What's the weather like in Beijing and Shanghai? Compare them."
    })
    
    await client.stop()

asyncio.run(main())

Run:

python weather_assistant.py

Execution Flow Explained

When you ask "What's the weather in Beijing and Shanghai":

AI analyzes the question → weather data needed
AI checks available tools → finds the get_weather function
AI decides to call → get_weather(city="Beijing")
SDK triggers the handler → your function returns {"temperature": "22°C", ...}
AI receives the result → calls get_weather(city="Shanghai") again
AI synthesizes the answer → "Beijing is sunny at 22°C; Shanghai is overcast at 18°C..."

The AI will automatically call the tool multiple times (once for Beijing, once for Shanghai) — you don't need to write any loop logic.

Why the Parameter Schema Matters

Why define parameters with Pydantic?

class GetWeatherParams(BaseModel):
    city: str = Field(description="City name")
    unit: str = Field(default="celsius", description="Temperature unit: celsius or fahrenheit")

The SDK converts this schema to JSON Schema and passes it to the LLM:

{
  "type": "object",
  "properties": {
    "city": {"type": "string", "description": "City name"},
    "unit": {"type": "string", "description": "Temperature unit"}
  },
  "required": ["city"]
}

The LLM extracts parameters based on this schema. Therefore, the clearer the description, the more accurately the AI will invoke the tool.

Step 4: Building an Interactive Assistant

Now let's combine all the capabilities: streaming output + tool invocation + command-line interaction.

Complete Runnable Code

Create interactive_assistant.py:

import asyncio
import random
import sys
from copilot import CopilotClient
from copilot.tools import define_tool
from copilot.generated.session_events import SessionEventType
from pydantic import BaseModel, Field

# Define tools
class GetWeatherParams(BaseModel):
    city: str = Field(description="City name, e.g., Beijing, Shanghai, Guangzhou")

@define_tool(description="Get current weather for a specified city")
async def get_weather(params: GetWeatherParams) -> dict:
    city = params.city
    conditions = ["sunny", "cloudy", "rainy", "overcast", "hazy"]
    temp = random.randint(5, 35)
    condition = random.choice(conditions)
    humidity = random.randint(30, 90)
    
    return {
        "city": city,
        "temperature": f"{temp}°C",
        "condition": condition,
        "humidity": f"{humidity}%"
    }

async def main():
    client = CopilotClient()
    await client.start()
    
    session = await client.create_session({
        "model": "gpt-4.1",
        "streaming": True,
        "tools": [get_weather],
    })
    
    # Event listeners
    def handle_event(event):
        if event.type == SessionEventType.ASSISTANT_MESSAGE_DELTA:
            sys.stdout.write(event.data.delta_content)
            sys.stdout.flush()
        if event.type == SessionEventType.SESSION_IDLE:
            print()  # Newline when complete
    
    session.on(handle_event)
    
    # Interactive conversation loop
    print("🌤️  Weather Assistant (type 'exit' to quit)")
    print("Try: 'What's the weather in Beijing?' or 'Compare weather in Guangzhou and Shenzhen'\n")
    
    while True:
        try:
            user_input = input("You: ")
        except EOFError:
            break
        
        if user_input.lower() in ["exit", "quit"]:
            break
        
        if not user_input.strip():
            continue
        
        sys.stdout.write("Assistant: ")
        await session.send_and_wait({"prompt": user_input})
        print()  # Extra newline
    
    await client.stop()
    print("Goodbye!")

asyncio.run(main())

Sample Output

python interactive_assistant.py

Example conversation:

🌤️  Weather Assistant (type 'exit' to quit)
Try: 'What's the weather in Beijing?' or 'Compare weather in Guangzhou and Shenzhen'

You: Compare weather in Guangzhou and Shenzhen
Assistant: Guangzhou: 21°C, sunny, 84% humidity.
Shenzhen: 33°C, hazy, 77% humidity.
Shenzhen is significantly warmer and hazier, while Guangzhou is cooler and sunnier with slightly higher humidity.

You: What's the weather in Shenzhen
Assistant: The weather in Shenzhen is 8°C, overcast, with 47% humidity.

You: quit
Goodbye!

Key Design Considerations

1. Session Persistence

Notice that we create the session only once, and reuse it throughout the entire conversation loop. This means:

The AI remembers previous conversation content
Follow-up questions like "What about tomorrow?" work (the AI knows which city you mean)
Tool call history is also retained

2. Async I/O Done Right

# Using input() in while True loop
user_input = input("You: ")  # Synchronous blocking, but acceptable here

# send_and_wait() is async
await session.send_and_wait({"prompt": user_input})

Why is input() blocking acceptable here? Because we're waiting for user input, not an I/O operation. The real async behavior happens when communicating with the CLI.

3. Graceful Exit

try:
    user_input = input("You: ")
except EOFError:  # Catch Ctrl+D
    break

Handling EOFError and common exit commands (exit, quit) ensures a smooth user experience.

Extension Ideas

Based on this framework, you can quickly extend functionality:

Add more tools:

@define_tool(description="Query real-time stock price")
async def get_stock_price(params): ...

@define_tool(description="Search information on the web")
async def web_search(params): ...

session = await client.create_session({
    "tools": [get_weather, get_stock_price, web_search],
})

The AI will automatically select the appropriate tool based on the user's question.

Add a system prompt:

session = await client.create_session({
    "model": "gpt-4.1",
    "tools": [get_weather],
    "system_message": {
        "content": "You are a professional weather assistant. Keep answers concise but informative."
    }
})

Log tool calls:

def handle_event(event):
    if event.type == SessionEventType.TOOL_CALL:
        print(f"\n[Debug] AI called tool: {event.data.tool_name}")
        print(f"[Debug] Arguments: {event.data.arguments}\n")

Debugging Tips

During development, observing CLI logs is crucial for understanding Agent behavior.

Start a standalone CLI server:

# Start CLI server in debug mode
copilot --headless --log-level debug --port 9999

# Optional: specify log directory
copilot --headless --log-level debug --port 9999 --log-dir ./logs

Connect from your code:

client = CopilotClient({
    'cli_url': 'http://localhost:9999',
})
await client.start()  # Connects directly without starting a new process

View logs:

By default, logs are saved in ~/.copilot/logs/, with an independent log file for each server process. Use tail -f to monitor in real time:

tail -f ~/.copilot/logs/process-<timestamp>-<pid>.log

Debug tool calls:

def handle_event(event):
    # Tool call starts
    if event.type == SessionEventType.TOOL_USER_REQUESTED:
        print(f"[Tool Call] {event.data.tool_name}")
        print(f"Arguments: {event.data.arguments}")
    
    # Tool execution result
    if event.type == SessionEventType.TOOL_EXECUTION_COMPLETE:
        print(f"[Tool Result] {event.data.tool_name}")
        print(f"Result: {event.data.result}")
    
    # AI's final response
    if event.type == SessionEventType.ASSISTANT_MESSAGE:
        print(f"[Assistant] {event.data.content[:100]}...")

session.on(handle_event)

This pattern gives you a clear view of the entire tool call chain: AI decision → tool execution → result return → final response.