13.langchain 入门到放弃(六)Vector store-backed retriever
矢量存储检索器是使用矢量存储来检索文档的检索器。它是矢量存储类的轻量级包装器,使其符合检索器接口。它使用向量存储实现的搜索方法(例如相似性搜索和 MMR)来查询向量存储中的文本。 search_type:"similarity" (default), "mmr", or "similarity_score_threshold"
一旦构建了向量存储,构建检索器就变得非常容易。让我们来看一个例子。
bluetooth.txt内容
无线蓝牙耳机,规格:单个耳机尺寸:1.5'' x 1.3''。为什么我们热爱它:这款无线蓝物美价廉
瑜伽垫,规格:尺寸:24'' x 68''。为什么我们热爱它:我们的瑜伽垫拥有出色的...
相似性搜索默认
search_type="similarity"
from langchain_community.document_loaders import TextLoader
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_text_splitters import RecursiveCharacterTextSplitter
raw_documents = TextLoader('../source/bluetooth.txt').load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=50, chunk_overlap=0)
documents = text_splitter.split_documents(raw_documents)
embeddings = HuggingFaceEmbeddings(model_name="../localLLM/all-MiniLM-L6-v2",
model_kwargs={'device': 'cpu'})
db = Chroma.from_documents(documents, embeddings)
retriever = db.as_retriever()
doc = retriever.invoke("蓝牙耳机")
print(doc[0].page_content)
最大边际相关性检索
search_type="mmr"
from langchain_community.document_loaders import TextLoader
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_text_splitters import RecursiveCharacterTextSplitter
raw_documents = TextLoader('../source/bluetooth.txt').load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=50, chunk_overlap=0)
documents = text_splitter.split_documents(raw_documents)
embeddings = HuggingFaceEmbeddings(model_name="../localLLM/all-MiniLM-L6-v2",
model_kwargs={'device': 'cpu'})
db = Chroma.from_documents(documents, embeddings)
retriever = db.as_retriever(search_type="mmr")
doc = retriever.invoke("蓝牙耳机")
print(doc[0].page_content)
相似度阈值检索
search_type="similarity_score_threshold", search_kwargs={"score_threshold": 0.5}
from langchain_community.document_loaders import TextLoader
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_text_splitters import RecursiveCharacterTextSplitter
raw_documents = TextLoader('../source/bluetooth.txt').load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=50, chunk_overlap=0)
documents = text_splitter.split_documents(raw_documents)
embeddings = HuggingFaceEmbeddings(model_name="../localLLM/all-MiniLM-L6-v2",
model_kwargs={'device': 'cpu'})
db = Chroma.from_documents(documents, embeddings)
retriever = db.as_retriever(search_type="similarity_score_threshold", search_kwargs={"score_threshold": 0.1})
doc = retriever.invoke("瑜伽垫")
print(doc[0].page_content)