Elasticsearch：使用 Elasticsearch 和 Cohere 构建 RAG在我之前的文章 “将 Coh

在我之前的文章 “将 Cohere 与 Elasticsearch 结合使用” 里，我详述了如何使用 Cohere 及 inference API 来创建 RAG 应用。鉴于 semantic_text 已经推出，我们将使用 semantic_text 及 semantic 搜索来完成之前的练习。

Elasticsearch 拥有开发人员使用生成式 AI 构建下一代搜索体验所需的所有工具，并且它通过其推理 API（inference API）支持与 Cohere 的本机集成。

如果你想要使用以下工具构建，请使用 Elastic：

向量数据库
部署多个 ML 模型
执行文本、向量和混合搜索
使用过滤器、方面、聚合进行搜索
应用文档和字段级安全性
在本地、云或 serverless 运行（预览）

本指南使用维基百科文章数据集来设置语义搜索管道。它将涵盖：

使用 Cohere 嵌入创建 Elastic 推理处理器
使用嵌入创建 Elasticsearch 索引
对 Elasticsearch 索引执行混合搜索并重新排名结果
执行基本 RAG

要查看完整的代码示例，请参阅此笔记本。你还可以在此处找到集成指南。

注：在上面所示的笔记本中，它没有使用 semantic_text 字段。在下面的展示中，我将使用最新的 semantic_text 字段来代替之前的 dense_vector 字段。

要求

一个 Cohere 帐户。你可以在地址申请一个 API key。你可以在地址 Login | Cohere 进行申请。
一个本地安装的集群。安装指令如下
Python 3.7 或更高版本

安装

Elasticsearch 及 Kibana

如果你还没有安装好自己的 Elasticsearch 及 Kibana，请参考如下的链接来进行安装：

在安装的时候，我们选择 Elastic Stack 8.x 来进行安装。特别值得指出的是：ES|QL 只在 Elastic Stack 8.11 及以后得版本中才有。你需要下载 Elastic Stack 8.11 及以后得版本来进行安装。

在首次启动 Elasticsearch 的时候，我们可以看到如下的输出：

在上面，我们可以看到 elastic 超级用户的密码。我们记下它，并将在下面的代码中进行使用。

我们还可以在安装 Elasticsearch 目录中找到 Elasticsearch 的访问证书：



1.  $ pwd
2.  /Users/liuxg/elastic/elasticsearch-8.16.0/config/certs
3.  $ ls
4.  http.p12      http_ca.crt   transport.p12

在上面，http_ca.crt 是我们需要用来访问 Elasticsearch 的证书。

我们首先克隆已经写好的代码：

git clone https://github.com/liu-xiao-guo/elasticsearch-labs

我们然后进入到该项目的根目录下：



1.  $ pwd
2.  /Users/liuxg/python/elasticsearch-labs/notebooks/cohere
3.  $ ls
4.  cohere-elasticsearch.ipynb           inference-cohere.ipynb
5.  inference-cohere-semantic-text.ipynb

如上所示，inference-cohere-semantic-text.ipynb 就是我们今天想要工作的 notebook。

我们通过如下的命令来拷贝所需要的证书：



1.  $ cp ~/elastic/elasticsearch-8.16.0/config/certs/http_ca.crt .
2.  $ ls
3.  cohere-elasticsearch.ipynb           inference-cohere-semantic-text.ipynb
4.  http_ca.crt                          inference-cohere.ipynb

安装所需要的 python 依赖包

pip3 install elasticsearch==8.16.0 python-dotenv cohere

我们通过如下的命令来查看 Elasticsearch 客户端的版本：



1.  $ pip3 list | grep cohere
2.  cohere                                  5.5.8
3.  $ pip3 list | grep elasticsearch
4.  elasticsearch                           8.16.0

启动白金试用

在下面，我们需要使用 ELSER。这是一个白金试用的功能。我们按照如下的步骤来启动白金试用：

这样我们就完成了白金试用功能。

创建环境变量

为了能够使得下面的应用顺利执行，在项目当前的目录下创建如下的一个叫做 .env 的文件：

.env



1.  export ES_ENDPOINT="localhost"
2.  export ES_USER="elastic"
3.  export ES_PASSWORD="uK+7WbkeXMzwk9YvP-H3"
4.  export COHERE_API_KEY="YourCohereAPIkey"

你需要根据自己的 Elasticsearch 配置进行相应的修改。你需要在地址获得你的 COHER_API_KEY。

然后，我们在运行上面命令的 terminal 中打入如下的命令：



1.  $ pwd
2.  /Users/liuxg/python/elasticsearch-labs/notebooks/cohere
3.  $ ls
4.  cohere-elasticsearch.ipynb           inference-cohere-semantic-text.ipynb
5.  http_ca.crt                          inference-cohere.ipynb
6.  $ jupter notebook inference-cohere-semantic-text.ipynb

准备数据

我们通过如下的命令来活动数据集：

 wget https://raw.githubusercontent.com/cohere-ai/notebooks/main/notebooks/data/embed_jobs_sample_data.jsonl



1.  $ wget https://raw.githubusercontent.com/cohere-ai/notebooks/main/notebooks/data/embed_jobs_sample_data.jsonl
2.  --2024-11-21 18:41:18--  https://raw.githubusercontent.com/cohere-ai/notebooks/main/notebooks/data/embed_jobs_sample_data.jsonl
3.  Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133
4.  Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.
5.  HTTP request sent, awaiting response... 200 OK
6.  Length: 1545639 (1.5M) [text/plain]
7.  Saving to: ‘embed_jobs_sample_data.jsonl’

9.  embed_jobs_sample_data.jso 100%[=====================================>]   1.47M  1.23MB/s    in 1.2s    

11.  2024-11-21 18:41:21 (1.23 MB/s) - ‘embed_jobs_sample_data.jsonl’ saved [1545639/1545639]

13.  $ ls
14.  cohere-elasticsearch.ipynb           inference-cohere-semantic-text.ipynb
15.  embed_jobs_sample_data.jsonl         inference-cohere.ipynb
16.  http_ca.crt

上面的 embed_jobs_sample_data.jsonl 具有如下的格式：

展示

读入变量并连接到 Elasticsearch

现在我们可以实例化 Python Elasticsearch 客户端。



1.  from elasticsearch import Elasticsearch, helpers
2.  import cohere
3.  import json
4.  import requests
5.  from dotenv import load_dotenv
6.  import os



1.  load_dotenv()

3.  ES_USER = os.getenv("ES_USER")
4.  ES_PASSWORD = os.getenv("ES_PASSWORD")
5.  ES_ENDPOINT = os.getenv("ES_ENDPOINT")
6.  COHERE_API_KEY = os.getenv("COHERE_API_KEY")

8.  url = f"https://{ES_USER}:{ES_PASSWORD}@{ES_ENDPOINT}:9200"
9.  print(url)

11.  client = Elasticsearch(url, ca_certs = "./http_ca.crt", verify_certs = True)
12.  print(client.info())

创建推理端点

构建向量搜索索引的最大痛点之一是计算大量数据的嵌入。幸运的是，Elastic 提供了推理端点，可用于摄取管道，以便在执行批量索引操作时自动计算嵌入。

要设置用于摄取的推理管道，我们首先必须创建一个使用 Cohere 嵌入的推理端点。你需要一个 Cohere API 密钥，你可以在 Cohere 帐户的 API 密钥部分下找到它。在上面我们已经把它从 .env 文件中读出来了。

我们将创建一个使用 embed-english-v3.0 和 int8 或字节压缩来节省存储空间的推理端点。



1.  from elasticsearch import BadRequestError

3.  try:
4.      client.inference.delete_model(inference_id="cohere_embeddings")
5.  except:
6.      ;

8.  try: 
9.      client.inference.put(
10.          task_type="text_embedding",
11.          inference_id="cohere_embeddings",
12.          body={
13.              "service": "cohere",
14.              "service_settings": {
15.                  "api_key": COHERE_API_KEY,
16.                  "model_id": "embed-english-v3.0",
17.                  "embedding_type": "int8",
18.                  "similarity": "cosine"
19.              },
20.          },
21.      )
22.  except BadRequestError as e:
23.      print(e)

我们可以在 Kibana 中进行查看：

GET _inference/_all

或者：

GET _inference/cohere_embeddings

创建索引

必须创建目标索引的映射（包含模型将根据你的输入文本生成的嵌入的索引）。目标索引必须具有具有 semantic_text 字段类型的字段，以便索引 Cohere 模型的输出。

让我们创建一个名为 cohere-wiki-embeddings 的索引，其中包含我们需要的映射：



1.  index_

3.  try:
4.      client.indices.delete(index=index_name, ignore_unavailable=True)
5.  except:
6.      ;

8.  if not client.indices.exists(index=index_name):
9.      client.indices.create(
10.          index=index_name,
11.          settings={"index": {"default_pipeline": "cohere_embeddings"}},
12.          mappings={
13.              "properties": {
14.                  "text_semantic": {
15.                      "type": "semantic_text",
16.                      "inference_id": "cohere_embeddings"
17.                  },
18.                  "text": {"type": "text", "copy_to": "text_semantic"},
19.                  "wiki_id": {"type": "integer"},
20.                  "url": {"type": "text"},
21.                  "views": {"type": "float"},
22.                  "langs": {"type": "integer"},
23.                  "title": {"type": "text"},
24.                  "paragraph_id": {"type": "integer"},
25.                  "id": {"type": "integer"}
26.              }
27.          },
28.      )

我们可以到 Kibana 里查看已经创建的索引：

GET cohere-wiki-embeddings

让我们注意一下该 API 调用中的几个重要参数：

semantic_text：字段类型使用推理端点自动生成文本内容的嵌入。
inference_id：指定要使用的推理端点的 ID。在此示例中，模型 ID 设置为 cohere_embeddings。
copy_to：指定包含推理结果的输出字段

创建摄入管道

现在，你已拥有一个推理端点和一个可用于存储嵌入的索引。下一步是创建一个摄取管道，该管道使用推理端点创建嵌入并将其存储在索引中。



1.  client.ingest.put_pipeline(
2.      id="cohere_embeddings",
3.      description="Ingest pipeline for Cohere inference.",
4.      processors=[
5.          {
6.              "inference": {
7.                  "model_id": "cohere_embeddings",
8.                  "input_output": {
9.                      "input_field": "text",
10.                      "output_field": "text_embedding",
11.                  },
12.              }
13.          }
14.      ],
15.  )

插入文档

让我们插入示例 wiki 数据集。你需要一个生产 Cohere 帐户来完成此步骤，否则文档摄取将因 API 请求速率限制而超时。

注：在本例中，我们采用一个试用的账号来进行展示。我们只摄取少量的数据，以避免限流。



1.  # url = "https://raw.githubusercontent.com/cohere-ai/notebooks/main/notebooks/data/embed_jobs_sample_data.jsonl"
2.  # response = requests.get(url)

4.  # Load the response data into a JSON object
5.  #jsonl_data = response.content.decode('utf-8').splitlines()

7.  import json
8.  from elasticsearch.helpers import BulkIndexError

10.  with open('./embed_jobs_sample_data.jsonl', 'r') as file:
11.      content = file.read()

13.  # Split the content by new lines and parse each line as JSON
14.  data = [json.loads(line) for line in content.strip().split("\n") if line]

16.  # We just take the very first 10 documents
17.  data = data[:10]
18.  print(f"Successfully loaded {len(data)} documents")

20.  # Prepare the documents to be indexed
21.  documents = []
22.  for line in data:
23.      data_dict = line
24.      documents.append({
25.          "_index": index_name,
26.          "_source": data_dict,
27.          }
28.        )

30.  # Use the bulk endpoint to index
31.  try:
32.      helpers.bulk(client, documents)
33.  except BulkIndexError as exc:
34.      print(f"Failed to index {len(exc.errors)} documents:")
35.      for error in exc.errors:
36.          print(error)

38.  print("Done indexing documents into `cohere-wiki-embeddings` index!")

我们可以通过 Kibana 进行查看：

GET cohere-wiki-embeddings/_search

语义搜索

在使用嵌入丰富数据集后，你可以使用 Elasticsearch 提供的语义查询来查询数据。Elasticsearch 中的 semantic_text大大简化了语义搜索。详细了解 Elasticsearch 中的语义文本如何让你专注于模型和结果而不是技术细节。



1.  query = "What are the Video categories on YouTube?"

3.  response = client.search(
4.      index="cohere-wiki-embeddings",
5.      size=100,
6.      query = {
7.          "semantic": {
8.              "query": query,
9.               "field": "text_semantic"
10.          }
11.      }
12.  )

14.  raw_documents = response["hits"]["hits"]

16.  # Display the first 10 results
17.  for document in raw_documents[0:10]:
18.    print(f'Title: {document["_source"]["title"]}\nText: {document["_source"]["text"]}\n')

20.  # Format the documents for ranking
21.  documents = []
22.  for hit in response["hits"]["hits"]:
23.      documents.append(hit["_source"]["text"])

混合搜索

使用嵌入丰富数据集后，你可以使用混合搜索查询数据。传递语义查询，并提供查询文本和用于创建嵌入的模型。



1.  query = "What are the Video categories on YouTube?"

3.  response = client.search(
4.      index="cohere-wiki-embeddings",
5.      size=100,
6.      query={
7.          "bool": {
8.              "must": {
9.                  "multi_match": {
10.                  "query": query,
11.                  "fields": ["text", "title"]
12.          }
13.              },
14.              "should": {
15.                  "semantic": {
16.                      "query": query,
17.                       "field": "text_semantic"
18.                  }
19.              },
20.          }
21.      }

23.  )

25.  raw_documents = response["hits"]["hits"]

27.  # Display the first 10 results
28.  for document in raw_documents[0:10]:
29.    print(f'Title: {document["_source"]["title"]}\nText: {document["_source"]["text"]}\n')

31.  # Format the documents for ranking
32.  documents = []
33.  for hit in response["hits"]["hits"]:
34.      documents.append(hit["_source"]["text"])

排名

为了有效地结合向量和 BM25 检索的结果，我们可以通过推理 API 使用 Cohere 的 Rerank 3 模型来对我们的结果进行最终、更精确的语义重新排名。

首先，使用你的 Cohere API 密钥创建一个推理端点。确保为你的端点指定一个名称，以及其中一个重新排名模型的 model_id。在此示例中，我们将使用 Rerank 3。



1.  # Delete the inference model if it already exists
2.  client.options(ignore_status=[404]).inference.delete(inference_id="cohere_rerank")

4.  client.inference.put(
5.      task_type="rerank",
6.      inference_id="cohere_rerank",
7.      body={
8.          "service": "cohere",
9.          "service_settings":{
10.              "api_key": COHERE_API_KEY,
11.              "model_id": "rerank-english-v3.0"
12.             },
13.          "task_settings": {
14.              "top_n": 10,
15.          },
16.      }
17.  )

现在，你可以使用该推理端点对结果进行重新排序。在这里，我们将传入用于检索的查询，以及我们刚刚使用混合搜索检索到的文档。

推理服务将以相关性降序排列的文档列表作为响应。每个文档都有一个对应的索引（反映文档发送到推理端点时的顺序），如果 “return_documents” 任务设置为 True，则文档文本也将包含在内。

在这种情况下，我们将响应设置为 False，并根据响应中返回的索引重建输入文档。



1.  response = client.inference.inference(
2.      inference_id="cohere_rerank",
3.      body={
4.          "query": query,
5.          "input": documents,
6.          "task_settings": {
7.              "return_documents": False
8.              }
9.          }
10.  )

12.  # Reconstruct the input documents based on the index provided in the rereank response
13.  ranked_documents = []
14.  for document in response.body["rerank"]:
15.    ranked_documents.append({
16.        "title": raw_documents[int(document["index"])]["_source"]["title"],
17.        "text": raw_documents[int(document["index"])]["_source"]["text"]
18.    })

20.  # Print the top 10 results
21.  for document in ranked_documents[0:10]:
22.    print(f"Title: {document['title']}\nText: {document['text']}\n")

检索增强生成 - RAG

现在我们已经对结果进行了排序，我们可以使用 Cohere 的 Chat API 轻松地将其转变为 RAG 系统。传入检索到的文档以及查询，并使用 Cohere 最新的生成模型 Command R+ 查看基础响应。

首先，我们将创建 Cohere 客户端。

co = cohere.Client(COHERE_API_KEY)

接下来，我们可以轻松地从 Cohere Chat API 获得带有引文的接地生成。我们只需将用户查询和从 Elastic 检索到的文档传递给 API，然后打印出我们的接地响应。



1.  response = co.chat(
2.      message=query,
3.      documents=ranked_documents,
4.      model='command-r-plus-08-2024'
5.  )

7.  source_documents = []
8.  for citation in response.citations:
9.    for document_id in citation.document_ids:
10.      if document_id not in source_documents:
11.        source_documents.append(document_id)

13.  print(f"Query: {query}")
14.  print(f"Response: {response.text}")
15.  print("Sources:")
16.  for document in response.documents:
17.    if document['id'] in source_documents:
18.      print(f"{document['title']}: {document['text']}")

就这样！使用 Cohere 和 Elastic 快速轻松地实现混合搜索和 RAG。

整个 notebook 的源码可以在地址 github.com/liu-xiao-gu… 下载。