13.langchain 入门到放弃(六)Vector store-backed retriever

101 阅读1分钟

13.langchain 入门到放弃(六)Vector store-backed retriever

  矢量存储检索器是使用矢量存储来检索文档的检索器。它是矢量存储类的轻量级包装器,使其符合检索器接口。它使用向量存储实现的搜索方法(例如相似性搜索和 MMR)来查询向量存储中的文本。 search_type​:"similarity" (default), "mmr", or "similarity_score_threshold"

  一旦构建了向量存储,构建检索器就变得非常容易。让我们来看一个例子。

  bluetooth.txt内容

无线蓝牙耳机,规格:单个耳机尺寸:1.5'' x 1.3''。为什么我们热爱它:这款无线蓝物美价廉
瑜伽垫,规格:尺寸:24'' x 68''。为什么我们热爱它:我们的瑜伽垫拥有出色的...

相似性搜索默认

  search_type="similarity"

from langchain_community.document_loaders import TextLoader
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_text_splitters import RecursiveCharacterTextSplitter

raw_documents = TextLoader('../source/bluetooth.txt').load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=50, chunk_overlap=0)
documents = text_splitter.split_documents(raw_documents)

embeddings = HuggingFaceEmbeddings(model_name="../localLLM/all-MiniLM-L6-v2",
                                   model_kwargs={'device': 'cpu'})

db = Chroma.from_documents(documents, embeddings)

retriever = db.as_retriever()

doc = retriever.invoke("蓝牙耳机")

print(doc[0].page_content)

最大边际相关性检索

  search_type="mmr"

from langchain_community.document_loaders import TextLoader
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_text_splitters import RecursiveCharacterTextSplitter

raw_documents = TextLoader('../source/bluetooth.txt').load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=50, chunk_overlap=0)
documents = text_splitter.split_documents(raw_documents)

embeddings = HuggingFaceEmbeddings(model_name="../localLLM/all-MiniLM-L6-v2",
                                   model_kwargs={'device': 'cpu'})

db = Chroma.from_documents(documents, embeddings)

retriever = db.as_retriever(search_type="mmr")

doc = retriever.invoke("蓝牙耳机")

print(doc[0].page_content)

相似度阈值检索

  search_type="similarity_score_threshold", search_kwargs={"score_threshold": 0.5}

from langchain_community.document_loaders import TextLoader
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_text_splitters import RecursiveCharacterTextSplitter

raw_documents = TextLoader('../source/bluetooth.txt').load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=50, chunk_overlap=0)
documents = text_splitter.split_documents(raw_documents)

embeddings = HuggingFaceEmbeddings(model_name="../localLLM/all-MiniLM-L6-v2",
                                   model_kwargs={'device': 'cpu'})

db = Chroma.from_documents(documents, embeddings)

retriever = db.as_retriever(search_type="similarity_score_threshold", search_kwargs={"score_threshold": 0.1})

doc = retriever.invoke("瑜伽垫")

print(doc[0].page_content)

  ‍