Centos7.9从零开始本地私有化部署 Deepseek-R1-32B

194 阅读6分钟

环境

  • 操作系统:CentOS 7.9
  • GPU NVIDIA:L20 * 4
  • CUDA 版本:12.2

一、修改本地源

1.备份本地yum源

mv /etc/yum.repos.d/CentOS-Base.repo /etc/yum.repos.d/CentOS-Base.repo_bak

2.获取阿里yum源配置文件

curl -o /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repo

3.更新yum缓存

yum makecache

二、升级gcc

sudo yum install centos-release-scl -y
sudo yum install devtoolset-11-gcc devtoolset-11-gcc-c++ -y
scl enable devtoolset-11 bash

centos7中安装了centos-release-scl后,之前的yum源变为不可用,执行下面这条命令后,再次使用yum报错。

yum install centos-release-scl -y

报错信息如下:

[root@localhost ~]# sudo yum install devtoolset-11
Could not retrieve mirrorlist http://mirrorlist.centos.org?arch=x86_64&release=7&repo=sclo-rh error was
14: curl#7 - "Failed to connect to 2a05:d012:8b5:6503:9efb:5cad:348f:e826: 网络不可达"


 One of the configured repositories failed (未知),
 and yum doesn't have enough cached data to continue. At this point the only
 safe thing yum can do is fail. There are a few ways to work "fix" this:

     1. Contact the upstream for the repository and get them to fix the problem.

     2. Reconfigure the baseurl/etc. for the repository, to point to a working
        upstream. This is most often useful if you are using a newer
        distribution release than is supported by the repository (and the
        packages for the previous distribution release still work).

     3. Disable the repository, so yum won't use it by default. Yum will then
        just ignore the repository until you permanently enable it again or use
        --enablerepo for temporary usage:

            yum-config-manager --disable <repoid>

     4. Configure the failing repository to be skipped, if it is unavailable.
        Note that yum will try to contact the repo. when it runs most commands,
        so will have to try and fail each time (and thus. yum will be be much
        slower). If it is a very temporary problem though, this is often a nice
        compromise:

            yum-config-manager --save --setopt=<repoid>.skip_if_unavailable=true

Cannot find a valid baseurl for repo: centos-sclo-rh/x86_64

CentOS-Base.repo / CentOS-SCLo-scl-rh.repo /CentOS-SCLo-scl.repo

需要在这三个文件中配置yum源,CentOS-Base.repo已经配置了阿里云镜像,因此需要修改其他两个文件。配置yum的镜像源。首先对CentOS-SCLo-scl-rh.repo进行修改。

vi CentOS-SCLo-scl-rh.repo

修改后

[centos-sclo-rh]
name=CentOS-7 - SCLo rh
baseurl=http://vault.centos.org/centos/7/sclo/$basearch/rh/
gpgcheck=1
enabled=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-SIG-SCLo
修改后发现依旧无法使用,将CentOS-SCLo-scl.repoyum源进行修改。

vi CentOS-SCLo-scl.repo

修改后

[centos-sclo-sclo]
name=CentOS-7 - SCLo sclo
baseurl=http://vault.centos.org/centos/7/sclo/$basearch/rh/
gpgcheck=1
enabled=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-SIG-SCLo

修改之后yum即可正常使用。

yum clean all
yum makecache

三、安装conda

下载安装脚本(以 Linux x86_64 为例)

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O Miniconda3.sh

运行安装

bash Miniconda3.sh

按提示操作,默认安装到 ~/miniconda3

安装完成后,重新加载 shell 或运行:

source ~/.bashrc

验证安装

conda --version

四、安装vllm

conda create -n vllm_env python=3.10 -y # 创建环境
conda activate vllm_env # 切换到新创建的环境
pip install vllm -i http://mirrors.aliyun.com/pypi/simple --trusted-host mirrors.aliyun.com # 下载vllm

如果无法安装,先安装一些依赖

pip install numpy ninja
pip install torch --index-url https://download.pytorch.org/whl/cu121  -i http://mirrors.aliyun.com/pypi/simple --trusted-host mirrors.aliyun.com
pip install xformers==0.0.29.post2 --no-build-isolation -i http://mirrors.aliyun.com/pypi/simple --trusted-host mirrors.aliyun.com

重新安装vllm

五、下载模型

git clone https://www.modelscope.cn/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B.git # 也可部署70B

Git下载 请确保 lfs 已经被正确安装

git lfs install

可选的下载地址

启动服务

source /root/.bashrc
export CUDA_VISIBLE_DEVICES=0,1,2,3  # 设置4张卡可用
conda activate vllm_env # 切换环境
nohup vllm serve /data/vllm/models/DeepSeek-R1-Distill-Qwen-32B \
 --served-model-name deepseek-ai/DeepSeek-R1-Distill-Qwen-32B \
 --tensor-parallel-size 4 \
 --max-model-len 32768 \
 --port 8000 \
 --max_num_seqs 300 \
 --gpu_memory_utilization 0.8 \
 --api-key 123 \
 > deepseek.log 2>&1 &

查看模型

curl http://localhost:8000/v1/models

六、安装嵌入模型

向量模型

git clone https://www.modelscope.cn/BAAI/bge-m3.git

Rerank模型

git clone https://www.modelscope.cn/BAAI/bge-reranker-v2-m3.git

接口服务

from fastapi import FastAPI, Request, HTTPException
from pydantic import BaseModel
from typing import List, Optional
import numpy as np
from sentence_transformers import SentenceTransformer
import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import logging

# 配置日志
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

app = FastAPI(title="LocalAI Embedding Server")

# 加载本地模型 (替换为你的实际路径)
model = SentenceTransformer("/data/vllm/models/BAAI/bge-m3", device="cuda")

class EmbeddingRequest(BaseModel):
    input: str | List[str]  # 支持单文本或批量
    model: str = "bge-m3"   # 模拟OpenAI的model参数

class EmbeddingResponse(BaseModel):
    data: List[dict]
    model: str
    usage: dict

@app.post("/v1/embeddings", response_model=EmbeddingResponse)
async def get_embeddings(request: EmbeddingRequest):
    try:
        # 编码文本
        texts = [request.input] if isinstance(request.input, str) else request.input
        embeddings = model.encode(texts, normalize_embeddings=True)  # 归一化向量

        # 构建兼容OpenAI的响应
        response = {
            "data": [
                {
                    "embedding": emb.tolist(),
                    "index": idx,
                    "object": "embedding"
                } for idx, emb in enumerate(embeddings)
            ],
            "model": request.model,
            "usage": {
                "prompt_tokens": sum(len(t.split()) for t in texts),  # 模拟token计数
                "total_tokens": sum(len(t.split()) for t in texts)
            }
        }
        return response
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))


@app.get("/health")
async def health_check():
    return {"status": "healthy", "model": "bge-m3"}

# 模型配置
MODEL_PATH = "/data/vllm/models/BAAI/bge-reranker-v2-m3"
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"

# 加载本地模型
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)
rerankerModel = AutoModelForSequenceClassification.from_pretrained(
    MODEL_PATH,
    torch_dtype=torch.float16 if DEVICE == "cuda" else torch.float32
).to(DEVICE)
rerankerModel.eval()

class RerankRequest(BaseModel):
    query: str
    documents: List[str]
    top_n: Optional[int] = None
    return_documents: Optional[bool] = False
    model: Optional[str] = "bge-reranker-v2-m3"

class RerankResult(BaseModel):
    index: int
    score: float
    relevance_score: float
    document: dict = None

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

class TokenUsage(BaseModel):
    prompt_tokens: int
    completion_tokens: int
    total_tokens: int

class RerankResponse(BaseModel):
    object: str = "list"
    results: List[RerankResult]
    model: str = "bge-reranker-v2-m3"
    usage: TokenUsage

def count_tokens(text: str) -> int:
    """统计单个文本的token数量"""
    return len(tokenizer.encode(text))

def calculate_usage(query: str, documents: List[str]) -> TokenUsage:
    """计算token使用量"""
    prompt_tokens = count_tokens(query) + sum(count_tokens(doc) for doc in documents)
    completion_tokens = len(documents)  # 假设每个文档生成1个分数
    return TokenUsage(
        prompt_tokens=prompt_tokens,
        completion_tokens=completion_tokens,
        total_tokens=prompt_tokens + completion_tokens
    )

@app.post("/v1/rerank", response_model=RerankResponse)
async def rerank(request: RerankRequest):
    try:
        # 计算token用量
        usage = calculate_usage(request.query, request.documents)

        # 构建输入对
        pairs = [[request.query, doc] for doc in request.documents]
        
        # 批量编码
        inputs = tokenizer(
            pairs,
            padding=True,
            truncation=True,
            max_length=512,
            return_tensors="pt"
        ).to(DEVICE)
        
        # 推理
        with torch.no_grad():
            scores = rerankerModel(**inputs).logits.squeeze(-1).cpu().numpy()
            scores = sigmoid(scores)  # 分数归一化到0-1
        
        # 排序处理
        sorted_indices = np.argsort(scores)[::-1]
        top_n = request.top_n if request.top_n else len(scores)
        results = [
            {
                "index": int(idx),
                "score": float(scores[idx]),
                "relevance_score": float(scores[idx]),
                "document": { 
                    "text": request.documents[idx] #if request.return_documents else None
                }
            }
            for idx in sorted_indices[:top_n]
        ]
        
        return {
            "object": "list", 
            "results": results,
            "model": request.model,
            "usage": usage
        }
    
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

@app.middleware("http")
async def log_request(request: Request, call_next):
    # 记录请求信息
    logger.info(f"Request: {request.method} {request.url}")
    
    # 记录查询参数
    query_params = dict(request.query_params)
    if query_params:
        logger.info(f"Query parameters: {query_params}")
    
    # 记录路径参数
    path_params = request.path_params
    if path_params:
        logger.info(f"Path parameters: {path_params}")
    
    # 对于 POST/PUT 请求,记录请求体
    if request.method in ["POST", "PUT", "PATCH"]:
        try:
            body = await request.json()
            logger.info(f"Request body: {body}")
        except:
            body = await request.body()
            logger.info(f"Request body (raw): {body.decode()}")
    
    response = await call_next(request)
    return response

if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=8001)

七、安装docker环境

安装部署

GitHub release 链接: github.com/1Panel-dev/…

1、执行以下安装脚本,根据命令行提示完成安装。

bash -c "$(curl -sSL https://resource.fit2cloud.com/1panel/package/v2/quick_start.sh)"

如果遇到 Docker 安装失败等问题,可以尝试运行以下脚本:

bash <(curl -sSL https://linuxmirrors.cn/docker.sh)

了解更多信息,请访问官方网站:linuxmirrors.cn

安装成功后,控制台会打印面板访问信息,可通过浏览器访问 1Panel:

http://目标服务器 IP 地址:目标端口/安全入口 如果使用的是云服务器,请至安全组开放目标端口。 ssh 登录 1Panel 服务器后,执行 1pctl user-info 命令可获取安全入口(entrance) 安装成功后,可使用 1pctl 命令行工具来维护 1Panel

八、安装dify

1、下载dify

git clone git@github.com:langgenius/dify.git

2、安装dify

cd ./dify/docker
docker-compose up -d

等待安装成功,访问:http://你的IP地址

1.png

3、设置模型

2.png

4、下载插件

3.png

9.png

5、设置参数

4.png

5.png

6.png

7.png

九、创建你的第一个应用

8.png

开始构建你的AI应用吧!有问题欢迎👏🏻交流!!!

qrcode_for_gh_63be2075d819_258.jpg