RestAPI的统一大模型推理服务LocalAI

458 阅读2分钟

LocalAI是 OpenAI(Anthropic 等)的免费开源替代方案,可作为本地推理的即插即用替代 REST API。它允许你在本地或本地环境中使用消费级硬件运行大语言模型(LLMs)、生成图像和生成音频,支持多种模型系列和架构。

7b16676e-d5b1-4c97-89bd-9fa5065c21ad_7786675119832255465.gif

本地运行我们使用docker镜像部署的方式。

docker run  -p 9781:8080 --gpus=5 -v /tmp/entrypoint.sh:/aio/entrypoint.sh -v /tmp/gallery:/build/gallery -v /tmp/gpu-8g:/aio/gpu-8g -v /tmp/cpu:/aio/cpu -v /tmp/intel:/aio/intel -v /tmp/localmodels:/build/models --name local-ai-test -ti localai/localai:v2.22.0-aio-gpu-nvidia-cuda-11

这里为了避免加载特别多的模型,根据自己的需要,把各个模型文件夹进行了精简

加快模型下载

可以提前使用链接教程下载好各种模型,也可以在启动docker的时候增加代理来加速访问,使用方法是在docker run后边加上-e参数

docker run -e HTTPS_PROXY=http://password@proxyhost:proxyport

接口使用

LocalAI部署完之后,所有的模型都可以使用很统一的RestAPI接口进行访问,比如

Embedding

curl http://localhost:9781/embeddings -X POST -H "Content-Type: application/json" -d '{
  "input": "Your text string goes here",
  "model": "text-embedding-ada-002"
}' | jq "."

SD文生图

curl http://localhost:9781/v1/images/generations  -H "Content-Type:
  application/json"  -d '{ "prompt": "|", "step": 25, "size": "512x512" }'

Rerank

curl http://a21.infly.cloud:8080/v1/rerank \
  -H "Content-Type: application/json" \
  -d '{
  "model": "jina-reranker-v1-base-en",
  "query": "Organic skincare products for sensitive skin",
  "documents": [
    "Eco-friendly kitchenware for modern homes",
    "Biodegradable cleaning supplies for eco-conscious consumers",
    "Organic cotton baby clothes for sensitive skin",
    "Natural organic skincare range for sensitive skin",
    "Tech gadgets for smart homes: 2024 edition",
    "Sustainable gardening tools and compost solutions",
    "Sensitive skin-friendly facial cleansers and toners",
    "Organic food wraps and storage solutions",
    "All-natural pet food for dogs with allergies",
    "Yoga mats made from recycled materials"
  ],
  "top_n": 3
}'

优化点

  1. huggingface.co全都改成hf-mirror.com重新打包容器镜像
  2. 部署一个代理服务,仅将huggingface.co转发到hf-mirror.com,其余网络流量不做转发,加快模型下载使用