简单几步让你的本地大模型拥有舒适的交互界面

496 阅读3分钟

前言

平日使用自己本文教大家如何部署 text-generation-webui 开源项目,简单几步让你的本地大模型拥有舒适的交互界面

conda 创建虚拟环境

创建虚拟环境,python 需要使用 3.11 的版本

conda create -n tg-webui python=3.11

进入虚拟环境

conda activate tg-webui

pytorch 官网找到适合自己 cuda 版本的 pytorch 安装命令,前提是要配置好适合自己显卡的 CUDA 环境,我这里 CUDA 是 11.8 ,已经提前安装好了,如果还不会可以参考我的安装教程,我使用的命令如下:

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

不同的操作系统的安装命令可以参考下面的图表内容,也可以直接去官网 image.png

clone 项目安装 requirements

从官网克隆整个项目

git clone https://github.com/oobabooga/text-generation-webui

然后进入这个文件夹中安装 requirements.txt ,各个操作系统的安装文件不一样,请自行选择。

image.png

我这里是 windows 操作系统,Nvidia 显卡,所以使用下面的安装命令。

pip install -r requirements.txt

这里需要注意 requirements.txt 最后一些无关的包(如下所示)删除即可,否则会引发不必要的错误

# llama-cpp-python (CPU only, AVX2)
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/cpu/llama_cpp_python-0.2.81+cpuavx2-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/cpu/llama_cpp_python-0.2.81+cpuavx2-cp310-cp310-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.10"
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/cpu/llama_cpp_python-0.2.81+cpuavx2-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/cpu/llama_cpp_python-0.2.81+cpuavx2-cp310-cp310-win_amd64.whl; platform_system == "Windows" and python_version == "3.10"

# llama-cpp-python (CUDA, no tensor cores)
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda-0.2.81+cu121-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda-0.2.81+cu121-cp310-cp310-win_amd64.whl; platform_system == "Windows" and python_version == "3.10"
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda-0.2.81+cu121-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda-0.2.81+cu121-cp310-cp310-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.10"

# llama-cpp-python (CUDA, tensor cores)
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda_tensorcores-0.2.81+cu121-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda_tensorcores-0.2.81+cu121-cp310-cp310-win_amd64.whl; platform_system == "Windows" and python_version == "3.10"
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda_tensorcores-0.2.81+cu121-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda_tensorcores-0.2.81+cu121-cp310-cp310-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.10"

# CUDA wheels
https://github.com/oobabooga/exllamav2/releases/download/v0.1.6/exllamav2-0.1.6+cu121.torch2.2.2-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
https://github.com/oobabooga/exllamav2/releases/download/v0.1.6/exllamav2-0.1.6+cu121.torch2.2.2-cp310-cp310-win_amd64.whl; platform_system == "Windows" and python_version == "3.10"
https://github.com/oobabooga/exllamav2/releases/download/v0.1.6/exllamav2-0.1.6+cu121.torch2.2.2-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
https://github.com/oobabooga/exllamav2/releases/download/v0.1.6/exllamav2-0.1.6+cu121.torch2.2.2-cp310-cp310-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.10"
https://github.com/oobabooga/exllamav2/releases/download/v0.1.6/exllamav2-0.1.6-py3-none-any.whl; platform_system == "Linux" and platform_machine != "x86_64"
https://github.com/oobabooga/flash-attention/releases/download/v2.5.9.post1/flash_attn-2.5.9.post1+cu122torch2.2.2cxx11abiFALSE-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
https://github.com/oobabooga/flash-attention/releases/download/v2.5.9.post1/flash_attn-2.5.9.post1+cu122torch2.2.2cxx11abiFALSE-cp310-cp310-win_amd64.whl; platform_system == "Windows" and python_version == "3.10"
https://github.com/Dao-AILab/flash-attention/releases/download/v2.5.9.post1/flash_attn-2.5.9.post1+cu122torch2.2cxx11abiFALSE-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
https://github.com/Dao-AILab/flash-attention/releases/download/v2.5.9.post1/flash_attn-2.5.9.post1+cu122torch2.2cxx11abiFALSE-cp310-cp310-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.10"
autoawq==0.2.5; platform_system == "Linux" or platform_system == "Windows"

看到下面的日志内容说明成功安装。

image.png

启动服务

控制台运行 server.py 启动服务

 python .\server.py

当你看到下面的日志打印,就说明启动成功了,访问给出来的 url : http://127.0.0.1:7860 即刻开始使用。

image.png

我们这里虽然可以启动成功,但是还没有配置可用的模型,我们将下载好的完整的 Qwen2-7B 模型,放到 text-generation-webui\models 目录下面,然后切换到 Model 页面,在 Model 框中会显示出来你放的各种模型名字,然后点击 Load 等待加载即可,加载完成即可切换到其他页面进行愉快的玩耍了。

image.png

如果在此步遇到如下问题 DLL load failed while importing flash_attn_2_cuda: The specified module could not be found,不要紧张,你只需要检查两个地方:

  • 你本机的 CUDA 和你的 pytorch 版本是不是一致
  • 上文提到的 requirements.txt 中文件末尾无用的行有没有删掉

如果完成了上述的两部检查,那么按照本教程重新快速部署一次环境,即可正常使用。

体验

下面我们就可以愉快的玩耍了。里面还有很多好玩的内容,可以参考官方的 WIKI 教程进一步定制化。下面是我改过名字等配置之后的效果。

image.png

参考