AI 工作流工具记录概述记录基于 LLM 模型增强的本地小工具的安装和用途 Model 推理服务 ollama 使用

概述

记录基于 LLM 模型增强的本地小工具的安装和用途

Model 推理服务

ollama

使用 Ollama 非常简单，只需要按照以下步骤： 安装 Ollama：根据你的操作系统，从 Ollama 官网下载并安装最新版本。

使用

启动 Ollama：打开终端或命令行，输入 ollama serve 命令启动 Ollama 服务器。
下载模型：在模型仓库找到想要的模型，然后使用 ollama pull 命令下载，例如 ollama pull deepseek-coder-v2:16b 。
运行模型：使用 ollama run 命令启动模型，例如 ollama run deepseek-coder-v2:16b 。
开始聊天：在终端中输入你的问题或指令，Ollama 会根据模型生成相应的回复

自定义 IP:port

默认 IP:port: 127.0.0.1:11434 ，只可以本机访问。通过配置环境变量 OLLAMA_HOST 可修改 IP:port To allow listening on all local interfaces, you can follow these steps:

If you’re running Ollama directly from the command line, use the
OLLAMA_HOST=0.0.0.0 ollama serve command to specify that it should listen on all local interfaces

Edit the service file: Open /etc/systemd/system/ollama.service and add the following line inside the [Service] section:

Environment="OLLAMA_HOST=0.0.0.0"

Once you’ve made your changes, reload the daemons using the command
sudo systemctl daemon-reload ,
and then restart the service with
sudo systemctl restart ollama.

For a Docker container, add the following to your docker-compose.yml file:

yaml


extra_hosts:
  - "host.docker.internal:host-gateway"

This will allow the Ollama instance to be accessible on any of the host’s networks interfaces. Once your container is running, you can check if it’s accessible from other containers or the host machine using the command:
curl http://host.docker.internal:11434 .

LM Sudio

lmstudio.ai/ 从官网下载安装安装程序，安装后有 GUI 界面操作，支持下载模型，但模型来源是 huggingface 需要科学上网。

也可以使用下载工具下载 LLM 模型，放在指定位置。比如从 hugginface 的镜像站 hf-mirror.com 下载模型。

Xinference

安装 — Xinference Xinference 在 Linux, Windows, MacOS 上都可以通过 pip 来安装。如果需要使用 Xinference 进行模型推理，可以根据不同的模型指定不同的引擎。

Hugging Face 的 Transformers 库

模型加载使用 Hugging Face 提供的 Transoformers 库。

AutoConfig
AutoModel
AutoTokenize

ChatGLM 模型

self.tokenizer = AutoTokenizer.from_pretrained(self.model_path, trust_remote_code = True)
        self.model = AutoModel.from_pretrained(self.model_path, trust_remote_code = True).quantize(8).half().cuda().eval()

Qwen 模型

self.tokenizer = AutoTokenizer.from_pretrained(self.model_path, trust_remote_code=True)
        self.model = AutoModelForCausalLM.from_pretrained(self.model_path, device_map="auto", trust_remote_code=True).eval()
        # Specify hyperparameters for generation
        self.model.generation_config = GenerationConfig.from_pretrained(self.model_path, trust_remote_code=True) # 可指定不同的生成长度、top_p等相关超参

ChatBot+知识库+搜索

Page Assist

一款浏览器插件，模型服务使用 ollama 实现本地模型部署和浏览器的搜索能力。非常容易搭建带有知识库和搜索能力的 ChatBot

Github 地址： github.com/n4ze3m/page…

Chrome 插件地址： chromewebstore.google.com/detail/Page…

OpenWebUI + SearXNG

GUI 界面基于前端框架，模型服务使用 ollama 实现本地模型部署和的 SearXNG 搜索能力，实现带有知识库和搜索能力的 ChatBot

使用 docker 运行 Open WebUI 的镜像，生成 docker-compose.yml 文件，使用镜像 ghcr.io/open-webui/open-webui:main, 端口使用 3000 映射 8080。

  # yaml 配置
    version: '3'
    services:
      open-webui:
        image: ghcr.io/open-webui/open-webui:main
        ports:
          - "3000:8080"
        volumes:
          - ./data:/app/backend/data
        environment:
          - WEBUI_AUTH=False

启动命令

docker compose up -d

启动后则可以使用浏览器访问: http://localhost:3000

模型服务配置

外部连接 第一个用户则是管理员，在管理员面板，选中外部连接，其中可以配置 OpenAI 和 ollama 的连接地址

点击 ollama 右边的小扳手，弹出一个管理 ollama 模型界面，可以下载，删除，导入模型。

模型可以查看到 ollama 下载完成的模型

文档 LLM 模型会存缺少某些领域知识，导致回答不够精准，所以出现一个 RAG 技术，通过附加领域相关的信息，增强 LLM 模型在某领域回答的准确性。

如何将领域相关知识传入 LLM 模型？通常是将领域知识文档进行分割，生成 Embedding 格式的向量数据，那么则同样需要一个 LLM 模型进行 embedding 。在配置的文档项中可以配置使用的 LLM 模型。

联网搜索

SearXNG

SearXNG 基于 docker 的安装比较简单，直接 clone github.com/searxng/sea… 仓库，然后执行 docker-compose up 则可，另外其中的 settings.yml 文件需要有两处修改：1. 设置开启 json 格式结果；2. 禁用限制访问 docs.searxng.org/admin/setti…

# see https://docs.searxng.org/admin/settings/settings.html#settings-use-default-settings
use_default_settings: true
server:
  # base_url is defined in the SEARXNG_BASE_URL environment variable, see .env and docker-compose.yml
  secret_key: "c7fd14aaf57df6155eebc352219702547b93d064cc92acbb0501bca9fec14167"  # change this!
  limiter: false  # can be disabled for a private instance
  image_proxy: true
search:
  formats:
    - html
    - json
ui:
  static_use_hash: true
redis:
  url: redis://redis:6379/0

3. Open WebUI 进入到管理员设置面板，联网搜索一栏，开启联网搜索，搜索引擎选择 searxng ： docs.openwebui.com/tutorials/i…

Searxng Query URL 中填写下面示例之一：

http://searxng:8080/search?q=<query> (using the container name and exposed port, suitable for Docker-based setups)
http://host.docker.internal:8080/search?q=<query> (using the host.docker.internal DNS name and the host port, suitable for Docker-based setups)
http://<searxng.local>/search?q=<query> (using a local domain name, suitable for local network access)
https://<search.domain.com>/search?q=<query> (using a custom domain name for a self-hosted SearXNG instance, suitable for public or private access)

特别注意 /search?q=<query> 是固定的.

笔记 obsidian + copilot

obsidian 是一款基于 markdown 的笔记 APP ，本地使用完全免费，且支持插件扩展。
copilot 则是一个第三方插件，将 LLM 模型能力接入 obsidian中。

自定义 Workflow&Agent

Dify （开源版扣子)

docs.dify.ai/zh-hans/get…

下载 dify 的仓库

git clone https://github.com/langgenius/dify.git

启动 Dify

进入 Dify 源代码的 Docker 目录

cd dify/docker

2. 复制环境配置文件

cp .env.example .env

3. 启动 Docker 容器

docker-compose up -d

AI 工作流工具记录

概述