保姆级教程，大模型（本地化部署）+MCP+智能体开发需要全流程走一遍，让你更清楚，大模型+MCP是如何结合并工作的。在你

需要全流程走一遍，让你更清楚，大模型+MCP是如何结合并工作的。在你的电脑上就可以完成一切实验。

先讲结构

如上图，智能体与大模型，以及MCP SERVER的关系。智能体在拿到用户的提问之后，会访问MCP SERVER获取该server提供了哪些服务，连同这些信息，以及提示词，给到大模型。

以一个claude（智能体）调用高德地图服务的场景为例（图片来自艾逗笔的书《这就是MCP》，如果你还不是很了解MCP，推荐读一读）。

在这个例子中：

1）首先是高德地图需要开放MCP服务，用于路线规划。

2）然后Claude（也就是智能体）需要配置该MCP SERVER。

3）Claude获取用户的问题，并把从MCP SERVER获取到的工具列表（包括工具的名称、用途、参数）一并反馈给大模型

4）大模型根据问题和工具清单，分析意图，并确认是否要调用工具，以及调用的参数，把这些信息返回给智能体。

5）智能体根据大模型的指示，调用MCP服务。

6）MCP返回结果，智能体再把结果一并给到大模型，有大模型组装自然语言的答复。

整个过程就是这样。现在需要在本地来验证这个过程。

一、本地部署大模型

首先，需要本地化部署大模型，这个我们需要借助ollama，因为大模型的本地部署需要解决兼容性问题，以及需要一堆代码支持模型的推理，这些都由ollama这个大组件来完成。

关于ollama的部署，以及拉取模型，可以参考网上方案，总而言之，启动ollama的服务，并拉取deepseek的模型。

完成以上过程，就在本地完成了大模型部署，可以回答问题。

这时候默认在11434端口会启动ollama的服务，并提供对deepseek本地模型的访问。

同时需要一个客户端访问ollama，把问题传给大模型，并获取回答。

import httpx  
from typing import List, Dict, Any, AsyncGenerator  
  
class OllamaClient:  
def __init__(self, base_url: str = "http://localhost:11434"):  
    self.api_url = f"{base_url.rstrip('/')}/api/chat"  
  
async def chat_complete(  
    self,  
    model: str,  
    messages: List[Dict[str, str]],  
    stream: bool = False,  
    options: Dict[str, Any] = None  
) -> str:  
"""非流式完整响应"""  
    payload = {  
        "model": model,  
        "messages": messages,  
        "stream": False,  
        "options": options or {}  
    }  
 
    async with httpx.AsyncClient(timeout=300.0) as client:  
    resp = await client.post(self.api_url, json=payload)  
    resp.raise_for_status()  
    return resp.json()["message"]["content"]  
  
async def chat_stream(  
    self,  
    model: str,  
    messages: List[Dict[str, str]],  
    options: Dict[str, Any] = None  
    ) -> AsyncGenerator[str, None]:  
    """流式响应（逐 token）"""  
    payload = {  
        "model": model,  
        "messages": messages,  
        "stream": True,  
        "options": options or {}  
    }  
    async with httpx.AsyncClient(timeout=300.0) as client:  
        async with client.stream("POST", self.api_url, json=payload) as resp:  
            resp.raise_for_status()  
            async for line in resp.aiter_lines():  
                if line.strip():  
                    data = httpx.Response(json=line).json()  
                    if "message" in data and "content" in data["message"]:  
                        yield data["message"]["content"]

二、MCP server

我们用fastapi创建一个MCP SERVER，功能非常简单。声明该server支持的功能，并支持自动注册。同时提供对具体调用进行同步或异步处理的实现。在8080端口启动服务。

from fastapi import FastAPI, HTTPException  
from pydantic import BaseModel  
from typing import List, Dict, Any, Callable, Optional  
import json  
import inspect  
import asyncio  
  
app = FastAPI(  
    title="Generic MCP (Model Context Protocol) Server",  
    description="A standard tool/function server for LLMs and agents.",  
    version="1.0.0"  
)  
  
# 工具注册表：name -> metadata + callable  
TOOLS: Dict[str, Dict] = {}  
  
class ToolCallRequest(BaseModel):  
    name: str  
    arguments: Dict[str, Any]  
  
class ToolCallResponse(BaseModel):  
    success: bool  
    result: Optional[str] = None  
    error: Optional[str] = None  
  
 
def register_tool(name: str, description: str = "", is_async: bool = False):  
"""  
装饰器：注册一个工具函数到 MCP Server  
支持同步或异步函数，自动解析参数类型（仅支持简单类型）  
"""  
    def decorator(func: Callable):  
        sig = inspect.signature(func)  
        parameters = {}  
        for param_name, param in sig.parameters.items():  
            # 默认假设为字符串（LLM 通常输出 string）  
            parameters[param_name] = {"type": "string"}  
  
        TOOLS[name] = {  
            "function": func,  
            "description": description,  
            "parameters": parameters,  
            "is_async": is_async  
        }  
        return func  
    return decorator  
  
@app.get("/mcp/v1/tools", response_model=List[Dict])  
async def list_tools():  
"""返回所有可用工具的元数据（符合 MCP 规范）"""  
    tools_meta = []  
    for name, meta in TOOLS.items():  
        tools_meta.append({  
        "name": name,  
        "description": meta["description"],  
        "parameters": {  
        "type": "object",  
        "properties": meta["parameters"]  
         }  
    })  
return tools_meta  
  
 
@app.post("/mcp/v1/tool_call", response_model=ToolCallResponse)  
async def call_tool(request: ToolCallRequest):  
"""执行指定工具"""  
    if request.name not in TOOLS:  
        raise HTTPException(status_code=404, detail=f"Tool '{request.name}' not found")  
  
    tool = TOOLS[request.name]  
    func = tool["function"]  
  
    try:  
    # 执行函数（同步或异步）  
        if tool["is_async"]:  
            result = await func(**request.arguments)  
        else:  
        # 在线程池中运行同步函数避免阻塞事件循环  
            loop = asyncio.get_event_loop()  
            result = await loop.run_in_executor(None, lambda: func(**request.arguments))  
        return ToolCallResponse(success=True, result=str(result))  
    except Exception as e:  
        return ToolCallResponse(success=False, error=str(e))  
  
  
@register_tool("get_current_time", "Get current date and time in ISO format")  
def get_current_time() -> str:  
    from datetime import datetime  
    return datetime.now().isoformat()  
  
@register_tool("add_numbers", "Add two numbers together")  
def add_numbers(a: float, b: float) -> float:  
    return a + b  
  
@register_tool("search_document", "Search a document for keywords (mock)")  
def search_document(query: str) -> str:  
    # 模拟搜索  
    return f"Mock result for query: '{query}'"  
  
# 启动入口，本质上只是用fastapi启动一个服务  
if __name__ == "__main__":  
    import uvicorn  
    print("🚀 Starting Generic MCP Server...")  
    print(" - Tools endpoint: GET http://localhost:8080/mcp/v1/tools")  
    print(" - Call tool: POST http://localhost:8080/mcp/v1/tool_call")  
    uvicorn.run(app, host="0.0.0.0", port=8080)

该server有一些简单的示例功能，比如获取时间、加法运算。可以根据实际情况，添加你需要的服务。

三、MCP client

为了访问该server，需要一个client，该段内容也用于智能体调用mcp server时使用。

智能体在需要的时候，会主动创建client，连接具体的mcp server。

代码实现如下：

import httpx  
from typing import List, Dict, Any, Optional  
  
class MCPClient:  
    def __init__(self, base_url: str = "http://localhost:8080"):  
        self.base_url = base_url.rstrip("/")  
        self._tools_cache: Optional[List[Dict]] = None  
  
async def list_tools(self) -> List[Dict]:  
"""获取所有可用工具"""  
    if self._tools_cache is None:  
        async with httpx.AsyncClient() as client:  
            resp = await client.get(f"{self.base_url}/mcp/v1/tools")  
            resp.raise_for_status()  
            self._tools_cache = resp.json()  
    return self._tools_cache  
  
async def call_tool(self, name: str, arguments: Dict[str, Any]) -> Dict[str, Any]:  
"""调用指定工具"""  
    async with httpx.AsyncClient() as client:  
        resp = await client.post(  
            f"{self.base_url}/mcp/v1/tool_call",  
            json={"name": name, "arguments": arguments}  
        )  
        resp.raise_for_status()  
        return resp.json()  
  
# 示例使用（非必需）  
if __name__ == "__main__":  
    import asyncio  
  
    async def demo():  
        client = MCPClient()  
        tools = await client.list_tools()  
        print("Available tools:", [t["name"] for t in tools])  
  
        result = await client.call_tool("get_current_time", {})  
        print("Time result:", result)  
  
        result = await client.call_tool("add_numbers", {"a": 10, "b": 20})  
        print("Add result:", result)  
  
        result = await client.call_tool("search_document", {"query":"databases"})  
        print("Search document result:", result)  
  
    asyncio.run(demo())

执行client的结果，能够正常列出server的功能，并返回正常的结果。

四、智能体

在上面的基础上，实现一个智能体。它需要实现从大模型推理，并获取MCP的信息。

import asyncio  
import json  
import re  
from typing import List, Dict, Any, Optional  
from mcpclient import MCPClient  
from ollama_client import OllamaClient  
  
class AIAgent:  
    def __init__(  
        self,  
        model: str = "deepseek:7b",  
        ollama_url: str = "http://localhost:11434",  
        mcp_url: str = "http://localhost:8080"  
    ):  
        self.model = model  
        self.ollama = OllamaClient(ollama_url)  
        self.mcp = MCPClient(mcp_url)  
        self.tools_info: List[Dict[str, Any]] = []  
        self.system_prompt = ""  
  
    async def initialize(self):  
    """启动时获取 MCP 工具列表并构建 system prompt"""  
        try:  
            self.tools_info = await self.mcp.list_tools()  
            print(f"✅ Loaded {len(self.tools_info)} tools from MCP Server.")  
        except Exception as e:  
            print(f"⚠️ Warning: Failed to connect to MCP Server: {e}")  
            self.tools_info = []  
  
    # 构建引导模型使用工具的系统提示  
    if self.tools_info:  
        tools_desc = "\n".join([  
            f'- {tool["name"]}: {tool["description"]} | 参数: {list(tool["parameters"]["properties"].keys())}'  
            for tool in self.tools_info  
        ])  
        self.system_prompt = (  
            "你是一个智能助手，可以使用以下工具帮助用户：\n"  
            f"{tools_desc}\n\n"  
            "当需要调用工具时，请严格按以下 JSON 格式输出，且不要包含其他内容：\n"  
            '{"tool_name": "工具名", "arguments": {"param1": "value1", ...}}\n'  
            "如果不需要调用工具，请直接回答。"  
        )  
    else:  
        self.system_prompt = "你是一个乐于助人的 AI 助手。"  
  
def _is_tool_call(self, text: str) -> Optional[Dict[str, Any]]:  
    """判断模型输出是否为工具调用指令"""  
    try:  
        # 尝试提取 JSON 块（兼容 markdown code block）  
        json_match = re.search(r"```(?:json)?\s*({.*?})\s*```", text, re.DOTALL)  
        raw_json = json_match.group(1) if json_match else text.strip()  
        data = json.loads(raw_json)  
        if isinstance(data, dict) and "tool_name" in data and "arguments" in data:  
            return data  
    except (json.JSONDecodeError, AttributeError):  
        pass  
    return None  
  
async def run(self, user_input: str, stream: bool = False, max_steps: int = 3) -> str:  
    """  
    执行一次完整推理（支持多轮工具调用）  
    """  
    messages = [  
    {"role": "system", "content": self.system_prompt},  
    {"role": "user", "content": user_input}  
    ]  
  
    for step in range(max_steps):  
    # 调用大模型  
    response = await self.ollama.chat_complete(self.model, messages)  
  
    # 检查是否需要调用工具  
    tool_call = self._is_tool_call(response)  
    if tool_call:  
        tool_name = tool_call["tool_name"]  
        arguments = tool_call["arguments"]  
  
        # 调用 MCP 工具  
        try:  
            tool_result = await self.mcp.call_tool(tool_name, arguments)  
            observation = (  
                f"Tool '{tool_name}' executed. "  
                f"Success: {tool_result['success']}. "  
                f"Result: {tool_result.get('result', '') or tool_result.get('error', '')}"  
            )  
        except Exception as e:  
            observation = f"Failed to call tool '{tool_name}': {str(e)}"  
  
        # 将工具调用和结果加入对话历史  
        messages.append({"role": "assistant", "content": response})  
        messages.append({"role": "user", "content": f"[Observation]: {observation}"})  
  
        if step == max_steps - 1:  
            # 最后一轮，强制生成最终答案  
            messages.append({"role": "user", "content": "请根据以上信息给出最终回答。"})  
            response = await self.ollama.chat_complete(self.model, messages)  
            break  
    else:  
        # 无需工具，直接返回  
        break  
  
return response  
  
async def run_stream(self, user_input: str, max_steps: int = 3):  
    """  
    流式输出最终回答（注意：工具调用阶段仍为非流式）  
    """  
    final_answer = await self.run(user_input, stream=False, max_steps=max_steps)  
    for chunk in final_answer.split(" "):  
        yield chunk + " "  
        await asyncio.sleep(0.01)  
  
async def main():  
    # 初始化 Agent  
    agent = AIAgent(model="deepseek-r1")  
    await agent.initialize()  
  
    # 示例查询  
    query = "现在几点了？然后计算 123 + 456 等于多少？"  
    print(f"👤 User: {query}\n")  
  
    # 非流式运行  
    answer = await agent.run(query)  
    print(f"🤖 Agent: {answer}\n")  
  
    # 流式运行（模拟打字效果）  
    print("🤖 Agent (streaming): ", end="", flush=True)  
    async for chunk in agent.run_stream("北京今天的天气如何？"):  
    print(chunk, end="", flush=True)  
    print("\n")  
  
if __name__ == "__main__":  
    asyncio.run(main())

五、实验结果

运行以上的智能体，得到如下：

可以看到第一个问题，成功的调用了MCP中的服务 get_current_time和add_number，并返回了结果。

（注：其中的“需要调用外部的MCP”是我为了调试加入的输出，证明确实调用了函数。）

第二个问题是北京的天气，由于缺乏相应的内容，大模型无法给出回答。

六、总结

完整验证了智能体、大模型、MCP的相互关系。并通过代码实现了MCP server、AI Agent。同时展示了本地化部署的大模型，如何与MCP功能结合，提供更精确的回答。

希望对你有帮助，欢迎留言讨论。