RAG（Retrieval-Augmented Generation）介绍 RAG（Retrieval-Augmen

RAG（Retrieval-Augmented Generation）介绍

Retrieval-Augmented Generation for Large Language Models: A Survey：arxiv.org/abs/231 2.10997

github项目：GitHub - Tongji-KGLLM/RAG-Survey

RAGFlow项目

ragflow项目地址：GitHub - infiniflow/ragflow: RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.

环境配置

wsl安装

安装 WSL文档：安装 WSL | Microsoft Learn

WSL基本命令：WSL 的基本命令 | Microsoft Learn 1、管理员权限打开powershell命令

也可以手动选择：

编辑

docker环境安装

下载安装路径：docs.docker.com/engine/inst…

设置镜像本地路径

默认镜像拉取到本地后会保存在c:盘，我们设置到其它路径

编辑

设置镜像源

国外镜像源拉取比较慢，更新成国内镜像源

编辑

测试docker是否正常

编辑

安装运行RAGFlow

ragflow项目地址：GitHub - infiniflow/ragflow: RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding. git把代码拉取下来

拉取代码后：

请注意，运行上述命令会自动下载 RAGFlow 的开发版本 docker 镜像。如果你想下载并运行特定版本的 docker 镜像，请在 docker/.env 文件中找到 RAGFLOW_VERSION 变量，将其改为对应版本。例如 RAGFLOW_VERSION=v0.10.0，这个版本就是github的代码版本，然后运行上述命令。

核心镜像文件大约 9 GB，可能需要一定时间拉取。请耐心等待这个过程会要一段时间，请耐心等待

编辑

启动

服务器启动成功后再次确认服务器状态：

出现以下界面提示说明服务器启动成功：

如果您跳过这一步系统确认步骤就登录 RAGFlow，你的浏览器有可能会提示或网络异常，因为 RAGFlow 可能并未完全启动成功。

下面网站和端口即可打开系统

按照提示注册登录即可

建立知识库与聊天

模型配置

正常我们选择本地ollama部署的大语言模型

界面演示如下

编辑

建立知识库

上传一个本地文件，一定要把文件解析好

编辑

新建助理时，选择好对应的知识库

编辑

选择模型配置时的大模型

编辑

Naive RAG

使用langchain构建简单RAG

Introduction | 🦜️🔗 LangChain Chroma

环境准备

安装chroma

编辑

使用openai的模型

如果使用openai的模型来生成，你需要在项目的根目录下的 .env 文件中设置相关的环境变量。要获取 OpenAI 的 API 密钥，你需要注册 OpenAI 账户，并在platform.openai.com/account/api…页面中选择“创建新的密钥”。

完成这些设置后，运行下面的命令来加载你所设置的环境变量。

通义千问

注册地址：help.aliyun.com/zh/dashscop… on-of-api-key

兼容openai文档地址：help.aliyun.com/zh/dashscop… openai-with-dashscope

完成这些设置后，运行下面的命令来加载你所设置的环境变量。

使用ollama

则需要安装前面模型微调训练中的ollama服务搭建方式，把ollama服务跑起来，并下载好推理用的模型

准备数据完整代码

from langchain_chroma import Chroma

from langchain_community.embeddings import HuggingFaceBgeEmbeddings

#准备知识库数据，建索引 def prepare_data():

loader = WebBaseLoader("baike.baidu.com/item/AIGC? fromModule=lemma_search-box")

documents = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)

chunks = text_splitter.split_documents(documents)

print(chunks[0].page_content)

return chunks

#embedding 知识库，保存到向量数据库

def embedding_data(chunks):

#openai embedding #rag_embeddings=OpenAIEmbeddings()

#创建BAAI的embedding

rag_embeddings = HuggingFaceBgeEmbeddings(model_name="BAAI/bge-small-zh- v1.5")

#embed保存知识到向量数据库

vector_store = Chroma.from_documents(documents=chunks, embedding=rag_embeddings,persist_directory="./chroma_langchain_db")

retriever = vector_store.as_retriever() return vector_store,retriever

#使用ollama服务

llm = OllamaLLM(model="qwen2:7b-instruct-q4_0") template = """您是问答任务的助理。

使用以下检索到的上下文来回答问题。

如果你不知道答案，就说你不知道。

最多使用三句话，不超过100字，保持答案简洁。

Question: {question} Context: {context} Answer:

"""

prompt = ChatPromptTemplate.from_template(template)

chunks = prepare_data()

vector_store,retriever = embedding_data(chunks)

#生成答案

def generate_answer(question):

Advanced RAG

数据提取

提取方式

llamaindex reader文档：Index - LlamaIndex

添加元数据

llamaindex metadata提取文档：Index - LlamaIndex

例子：docs.llamaindex.ai/en/stable/e… e/

知识图谱

使用KG来组织多个文档，可以参考这篇研究论文KGP：Knowledge Graph Prompting for Multi-

Document Question Answering ：[2308.11730] Knowledge Graph Prompting for Multi-Document Question Answering

llamaindex 知识图谱文档：docs.llamaindex.ai/en/stable/a… graph/

层次化索引结构

总分层级索引（总→细，提高搜索的效率）

利用元数据进行多层次过滤：docs.llamaindex.ai/en/stable/e… c_auto_retrieval/multi_doc_auto_retrieval/

FilterOperator, MetadataFilter, MetadataFilters,

)

async def aprocess_doc(doc, include_summary: bool = True): """Process doc."""

metadata = doc.metadata

date_tokens = metadata["created_at"].split("T")[0].split("-") year = int(date_tokens[0])

month = int(date_tokens[1]) day = int(date_tokens[2])

assignee = (

"" if "assignee" not in doc.metadata else doc.metadata["assignee"]

)

size = ""

if len(doc.metadata["labels"]) > 0:

size_arr = [l for l in doc.metadata["labels"] if "size:" in l] size = size_arr[0].split(":")[1] if len(size_arr) > 0 else ""

new_metadata = {

"state": metadata["state"], "year": year,

"month": month, "day": day, "assignee": assignee, "size": size,

}

提取文档总结摘要

summary_index = SummaryIndex.from_documents([doc])

query_str = "Give a one-sentence concise summary of this issue." query_engine = summary_index.as_query_engine(

llm=OpenAI(model="gpt-3.5-turbo")

)

summary_txt = await query_engine.aquery(query_str) summary_txt = str(summary_txt)

index_id = doc.metadata["index_id"] # 通过 doc id过滤出对应的文档

filters = MetadataFilters( filters=[

MetadataFilter(

key="index_id", operator=FilterOperator.EQ, value=int(index_id)

]

)

#创建的一个索引节点，包括有元数据和摘要总结文本 index_node = IndexNode(

text=summary_txt,#总结 metadata=new_metadata,#包括年月日，状态等

obj=doc_index.as_retriever(filters=filters), index_id=doc.id_,#文档id

)

父子层级索引（细→总，提高搜索精确问题的准确性）

TreeIndex

句子窗口索引

多种切分方式并行查询

并行优化例子：

Parallelizing Ingestion Pipeline - LlamaIndex

多chunks大小索引和查询：

Ensemble Retrieval Guide - LlamaIndex

预检索过程（Pre-Retrieval Process）

提示词优化

Advanced Prompt Techniques (Variable Mappings, Functions) - LlamaIndex

提示词改写

这有一篇论文：Query Rewriting for Retrieval-Augmented Large Language Models arxiv.or g/pdf/2305.14283

子查询

llamaindex 查询引擎例子：docs.llamaindex.ai/en/stable/e…

_auto_retrieval/multi_doc_auto_retrieval/

HyDE（假设答案）

有篇论文 Precise Zero-Shot Dense Retrieval without Relevance Labels ：arxiv.org/pdf/221 2.10496

Reverse HyDE（假设问题） CoVe

使用Meta AI 提出的Chain-of-Verification：[2309.11495] Chain-of-Verification Reduces Hallucination in Large Language Models，

检索过程（Retrieval）

在条件成本允许的情况下，可以适当地进行微调，从而提升其在垂直领域检索效果。以下是部分文档中介绍的微调方式，我们后面课程会详细讲

编辑

LlamaIndex Embedding 微调方式：github.com/run-llama/f… ain/evaluate.ipynb

编辑

Retrieval Augmentation 微调:docs.llamaindex.ai/en/stable/e… edge/finetune_retrieval_aug.html#fine-tuning-with-retrieval-augmentation

编辑

cross-encoder 交叉编码微调：docs.llamaindex.ai/en/latest/e…

_encoder_finetuning/cross_encoder_finetuning.html#

后检索过程（Post-Retrieval Process）

提示词压缩（Prompt Compression）

llamaindex关于提示词的文档：docs.llamaindex.ai/en/stable/e… prompts/

微软的LLMLinggua项目：GitHub - microsoft/LLMLingua: [EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.

LLMLingua

论文：aclanthology.org/2023.emnlp-…

LongLLMLingua

论文：aclanthology.org/2024.acl-lo…

LongLLMLingua2

论文：aclanthology.org/2024.findin…

使用llamaindex实例

API Reference - LlamaIndex

环境配置

我们继续使用前面langchain例子的python虚环境，不用新建，激活就行

不同LLM环境配置

api_key设置

api_key环境变量和langchain项目方法一样

安装chroma

安装chroma轻量级向量数据库，因为它轻量并且支持windows，不需要wsl，不需要docker

准备数据

在这个例子以及后面的例子中，语料库我们都将使用百度百科关于aigc的知识：baike.baidu.co m/item/AIGC?fromModule=lemma_search-box

完整代码

from llama_index.readers.web import TrafilaturaWebReader from llama_index.core.node_parser import SimpleNodeParser from llama_index.core.schema import IndexNode

from llama_index.embeddings.huggingface import HuggingFaceEmbedding from llama_index.core import VectorStoreIndex, StorageContext

#from llama_index.llms.openai import OpenAI

#from llama_index.embeddings.openai import OpenAIEmbedding from llama_index.llms.ollama import Ollama

from llama_index.core.node_parser import SentenceSplitter from llama_index.core import Settings

import chromadb

from llama_index.vector_stores.chroma import ChromaVectorStore

def prepare_data(): url="baike.baidu.com/item/AIGC?f…" docs = TrafilaturaWebReader().load_data([url])

return docs

#embed保存知识到向量数据库

def embedding_data(docs): #向量数据库客户端

chroma_client = chromadb.EphemeralClient()

chroma_collection = chroma_client.create_collection("quickstart")

#向量数据库，指定了存储位置 vector_store =

ChromaVectorStore(chroma_collection=chroma_collection,persist_dir="./chroma_langc hain_db")

storage_context = StorageContext.from_defaults(vector_store=vector_store)

#创建文档切割器

node_parser = SimpleNodeParser.from_defaults(chunk_size=500,chunk_overlap=50)

#创建BAAI的embedding

embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-zh-v1.5") #创建index

base_index = VectorStoreIndex.from_documents(documents=docs,transformations= [node_parser],storage_context=storage_context, embed_model=embed_model)

return base_index,embed_model

#龙哥抖音号：龙哥紫貂智能

def get_llm():

#创建OpenAI的llm

#llm = OpenAI(model="gpt-3.5-turbo")

#通义千问 '''

from llama_index.llms.dashscope import DashScope, DashScopeGenerationModels llm = DashScope(model_name=DashScopeGenerationModels.QWEN_MAX)

'''

#ollama本地模型

llm = Ollama(model="qwen2:7b-instruct-q4_0", request_timeout=120.0)

#创建谷歌gemini的llm # llm = Gemini()

return llm

def retrieve_data(question): #创建检索器

base_retriever = base_index.as_retriever(similarity_top_k=2)

#检索相关文档

retrievals = base_retriever.retrieve( question

)

#print(retrievals)

#docs.llamaindex.ai/en/stable/e…

from llama_index.core.response.notebook_utils import display_source_node

for n in retrievals:

display_source_node(n, source_length=1500)

return retrievals

def generate_answer(question):

query_engine = base_index.as_query_engine()

#大语言模型的回答

response = query_engine.query( question

)

Modular RAG

论文Modular RAG: Transforming RAG Systems into LEGO-like Reconfigurable Frameworks：http s://arxiv.org/pdf/2407.21059

推理阶段

网易开源的QAnything

比如网易开源的QAnything：GitHub - netease-youdao/QAnything: Question and Answer based on Anything.

QAnything使用的检索组件BCEmbedding：GitHub - netease-youdao/BCEmbedding: Netease Youdao's open-source embedding and reranker models for RAG products.有非常强悍的双语和跨语种能力

bce-embedding-base_v1和bce-reranker-base_v1的组合是SOTA

重写-检索-阅读（RRR）

重写-检索-阅读（RRR）也是典型的顺序结构（arxiv.org/pdf/2305.14…）。

条件模式

条件 RAG 的经典实现是semantic-router这个项目：GitHub - aurelio-labs/semantic-router: Superfast AI decision making and intelligent processing of multi-modal data.

llamaindex实现例子代码

迭代检索

迭代检索的一个典型案例是ITER-RETGEN ：[2305.15294] Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy，它迭代检索增强和生成增强。

递归检索

1、澄清树（TOC）

论文Tree of Clarifications: Answering Ambiguous Questions with Retrieval-Augmented Large Language Models：arxiv.org/pdf/2310.14…

llamaindex文档：docs.llamaindex.ai/en/stable/e… e_retriever/

自适应（主动）检索

1、基于提示词方法：

一个典型的实现示例是前瞻式主动检索增强生成 FLARE（Forward-Looking Active Retrieval Augmented Generation）：

blog.lancedb.com/better-rag-… 9f/

论文Active Retrieval Augmented Generation：arxiv.org/pdf/2305.06…

llamaindex实现了FLARE instruct

langchain中实现了 FlareChain

下面是FLARE Direct 例子代码

from langchain import PromptTemplate, LLMChain

from langchain.text_splitter import RecursiveCharacterTextSplitter from langchain.chains import RetrievalQA

from langchain.embeddings import HuggingFaceBgeEmbeddings from langchain.document_loaders import PyPDFLoader

from langchain.vectorstores import LanceDB

from langchain.document_loaders import ArxivLoader from langchain.chains import FlareChain

from langchain.prompts import PromptTemplate from langchain.chains import LLMChain import os

import gradio as gr import lancedb

from io import BytesIO

from langchain.llms import OpenAI import getpass

pass your api key

os.environ["OPENAI_API_KEY"] = "sk-yourapikeyforopenai"

llm = OpenAI()

os.environ["OPENAI_API_KEY"] = "sk-yourapikeyforopenai" llm = OpenAI()

model_name = "BAAI/bge-large-en"

model_kwargs = {'device': 'cpu'}

encode_kwargs = {'normalize_embeddings': False} embeddings = HuggingFaceBgeEmbeddings(

model_name=model_name, model_kwargs=model_kwargs, encode_kwargs=encode_kwargs

)

here is example arxiv.org/pdf/2305.06… # you need to pass this number to query 2305.06983

fetch docs from arxiv, in this case it's the FLARE paper

docs = ArxivLoader(query="2305.06983", load_max_docs=2).load() # instantiate text splitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1500,

chunk_overlap=150)

split the document into chunks

doc_chunks = text_splitter.split_documents(docs) # lancedb vectordb

db = lancedb.connect('/tmp/lancedb')

table = db.create_table("documentsai", data=[

{"vector": embeddings.embed_query("Hello World"), "text": "Hello World", "id": "1"}

], mode="overwrite")

vector_store = LanceDB.from_documents(doc_chunks, embeddings, connection=table) vector_store_retriever = vector_store.as_retriever()

flare = FlareChain.from_llm( llm=llm, retriever=vector_store_retriever, max_generation_len=300,

min_prob=0.45#任何以低于此概率生成的token都将被视为不确定

)

Define a function to generate FLARE output based on user input def generate_flare_output(input_text):

output = flare.run(input_text) return output

input = gr.Text(

label="Prompt", show_label=False, max_lines=1,

placeholder="Enter your prompt", container=False,

)

iface = gr.Interface(fn=generate_flare_output, inputs=input,

outputs="text", title="My AI bot",

description="FLARE implementation with lancedb & bge embedding.", allow_screenshot=False,

allow_flagging=False

)

iface.launch(debug=True)

2、Tuning-base：

一个典型案例是 Self-RAG（SELF-RAG: LEARNING TO RETRIEVE, GENERATE, AND CRITIQUE THROUGH SELF-REFLECTION）

论文：arxiv.org/pdf/2310.11… github：Self-RAG: Learning to Retrieve, Generate and Critique through Self-Reflection

微调后的模型

huggingface.co/selfrag/sel…

llamaindex self_rag 例子代码：首次执行时需要下载 SelfRAGPack

langchain例子深入

让LLM在用户的查询语句的基础上再生成多个查询语句，这些LLM生成的查询语句是从不同角度，不同视角对用户查询语句的补充：

LLM在此基础上又生成了5个问题：

多重查询完整代码

#准备知识库数据，建索引 def prepare_data():

loader = WebBaseLoader("baike.baidu.com/item/AIGC? fromModule=lemma_search-box")

documents = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)

chunks = text_splitter.split_documents(documents)

print(chunks[0].page_content)

return chunks

#embedding 知识库，保存到向量数据库

def embedding_data(chunks):

#openai embedding #rag_embeddings=OpenAIEmbeddings()

#创建BAAI的embedding

rag_embeddings = HuggingFaceBgeEmbeddings(model_name="BAAI/bge-small-zh- v1.5")

#embed保存知识到向量数据库

vector_store = Chroma.from_documents(documents=chunks, embedding=rag_embeddings,persist_directory="./chroma_langchain_db")

retriever = vector_store.as_retriever() return vector_store,retriever

#龙哥抖音号：龙哥紫貂智能

#获取多查询，检索出知识，然后RRF排序融合 def get_multiple_queries(question):

Multi Query: Different Perspectives

template = """You are an AI language model assistant. Your task is to generate five

different versions of the given user question to retrieve relevant documents from a vector

database. By generating multiple perspectives on the user question, your goal is to help

the user overcome some of the limitations of the distance-based similarity search.

Provide these alternative questions separated by newlines. Original question:

{question}"""

prompt_perspectives = ChatPromptTemplate.from_template(template) generate_queries = (

prompt_perspectives

| llm

| StrOutputParser()

| (lambda x: x.split("\n"))

)

#生成多个查询

response = generate_queries.invoke({"question":question}) print(response)

#检索出知识

all_results = retrieval_and_rank(response)

#RRF融合

reranked_results = reciprocal_rank_fusion(all_results)

return generate_queries,reranked_results

def get_unique_union(documents: list[list]): """ Unique union of retrieved docs """

Flatten list of lists, and convert each Document to string

flattened_docs = [dumps(doc) for sublist in documents for doc in sublist] # Get unique documents

unique_docs = list(set(flattened_docs))

Return

return [loads(doc) for doc in unique_docs]

#生成答案

def generate_answer(question): '''

#使用通义千问

llm = ChatOpenAI(

api_key=os.getenv("DASHSCOPE_API_KEY"), # 如果您没有配置环境变量，请在此处用您的API Key进行替换

base_url="dashscope.aliyuncs.com/compatible-…", # 填写

DashScope base_url

model="qwen-plus"

)

'''

#使用openai

#llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)

rag_chain = (

{"context": retriever, "question": RunnablePassthrough()}

| prompt

| llm

| StrOutputParser()

)

resp=rag_chain.invoke(question) print(resp)

#生成答案

def multi_query_generate_answer(question): # RAG

template = """Answer the following question based on this context:

{context}

Question: {question} """

prompt = ChatPromptTemplate.from_template(template)

retrieval_chain = generate_queries | retriever.map() | get_unique_union

final_rag_chain = (

{"context": retrieval_chain,

"question": operator.itemgetter("question")}

| prompt

| llm

| StrOutputParser()

)

response = final_rag_chain.invoke({"question":question}) print(response)

#检索出每一个查询的相关知识，每一个查询对应的相关知识又相似得分，对每一个查询内部进行排名 def retrieval_and_rank(queries):

all_results = {}

for query in queries: if query:

search_results = vector_store.similarity_search_with_score(query) results = []

for res in search_results:

content = res[0].page_content score = res[1] results.append((content, score))

all_results[query] = results

document_ranks = []

for query, doc_score_list in all_results.items(): #每一个查询内部的list排名

ranking_list = [doc for doc, _ in sorted(doc_score_list, key=lambda x: x[1], reverse=True)]

document_ranks.append(ranking_list)

return document_ranks

##对所有查询结果按照rrf倒排融合

def reciprocal_rank_fusion(document_ranks, k=60): fused_scores = {}

for docs in document_ranks:

for rank, doc in enumerate(docs): doc_str = dumps(doc)

if doc_str not in fused_scores: fused_scores[doc_str] = 0

fused_scores[doc_str] += 1 / (rank + k)

reranked_results = [ (loads(doc), score)

for doc, score in sorted(fused_scores.items(), key=lambda x: x[1], reverse=True)

]

return reranked_results

#使用ollama服务

llm = OllamaLLM(model="qwen2:7b-instruct-q4_0") template = """您是问答任务的助理。

使用以下检索到的上下文来回答问题。

如果你不知道答案，就说你不知道。

最多使用三句话，不超过100字，保持答案简洁。

Question: {question} Context: {context} Answer:

"""

prompt = ChatPromptTemplate.from_template(template)

chunks = prepare_data()

vector_store,retriever = embedding_data(chunks)

query = "艾伦•图灵的论文叫什么"

generate_queries,querys= get_multiple_queries(query)

multi_query_generate_answer(query)

#generate_answer(query)

#生成提示词模版 '''

template = """你是一名智能助手，根据上下文回答用户的问题，不需要回答额外的信息或捏造事实。

已知内容：

{context}

问题：

{question} """

RAG效果评估

RAG评价指标

langchain 的 Criteria Evaluation python.langchain.com/v0.1/docs/g… hain/

Ragas 中的 Aspect Critique

docs.ragas.io/en/latest/c…

常用评估工具

Ragas

🚀 Get Started - Ragas

也可以通过 langsmith（LangSmith）来监控每次评估的过程，帮助分析每次评估的原因和观察 API key 的消耗。

TruLens

Llama-Index

Redirecting...

phoenix

github开源项目：GitHub - Arize-ai/phoenix: AI Observability & Evaluation

文档地址：Arize Phoenix | Phoenix

deepeval

GitHub - confident-ai/deepeval: The LLM Evaluation Framework

OpenAI Evals

GitHub - openai/evals: Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.

LangSmith 和 Langfuse：

Evaluation Overview | 🦜️🛠️ LangSmith

使用 RAGAs 评估

github项目地址：GitHub - explodinggradients/ragas: Supercharge Your LLM Application Evaluations 🚀

文档地址：Ragas

环境准备

准备评估数据

在这个例子以及后面的例子中，语料库我们都将使用百度百科关于aigc的知识：baike.baidu.co m/item/AIGC?fromModule=lemma_search-box

操作如下：

评估 RAG

评估指标，可以参考文档：docs.ragas.io/en/stable/c… ics

完整代码

print(chunks[0].page_content)

return chunks

#embedding 知识库，保存到向量数据库

def embedding_data(chunks):

#openai embedding #rag_embeddings=OpenAIEmbeddings()

#创建BAAI的embedding

rag_embeddings = HuggingFaceBgeEmbeddings(model_name="BAAI/bge-small-zh- v1.5")

#embed保存知识到向量数据库

vector_store = Chroma.from_documents(documents=chunks, embedding=rag_embeddings,persist_directory="./chroma_langchain_db")

retriever = vector_store.as_retriever()

return vector_store,retriever,rag_embeddings

#使用ollama服务

llm = OllamaLLM(model="qwen2:7b-instruct-q4_0") template = """您是问答任务的助理。

使用以下检索到的上下文来回答问题。

如果你不知道答案，就说你不知道。

最多使用三句话，不超过100字，保持答案简洁。

Question: {question} Context: {context} Answer:

"""

prompt = ChatPromptTemplate.from_template(template)

chunks = prepare_data()

vector_store,retriever,embedding= embedding_data(chunks)

#生成答案

def ragas_eval(): '''

#使用通义千问

llm = ChatOpenAI(

api_key=os.getenv("DASHSCOPE_API_KEY"), # 如果您没有配置环境变量，请在此处用您的API Key进行替换

base_url="dashscope.aliyuncs.com/compatible-…", # 填写

DashScope base_url

model="qwen-plus"

)

'''

#使用openai

#llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)

rag_chain = (

{"context": retriever, "question": RunnablePassthrough()}

| prompt

| llm

| StrOutputParser()

)

questions = ["艾伦•图灵的论文叫什么?",

"人工智能生成的画作在佳士得拍卖行卖了什么价格?",

"目前企业在使用相关的AIGC能力时，主要有哪五种方式?",

]

ground_truths = ["计算机器与智能（Computing Machinery and Intelligence ）", "2018年，人工智能生成的一幅画作在佳士得拍卖行以43.25万美元的价格成交", "企业在使用AIGC能力时的五种主要方式包括：直接使用、Prompt、LoRA、

Finetune、Train"]

answers = [] contexts = []

Inference

for query in questions: answers.append(rag_chain.invoke(query)) contexts.append([docs.page_content for docs in

retriever.get_relevant_documents(query)])

To dict data = {

"question": questions, "answer": answers, "contexts": contexts, "ground_truth": ground_truths

}

Convert dict to dataset

dataset = Dataset.from_dict(data) return dataset

from ragas import evaluate from ragas.metrics import (

faithfulness, answer_relevancy, context_recall, context_precision,

)

run_config = RunConfig(

使用TruLens 评估

项目官网：www.trulens.org

github地址：GitHub - truera/trulens: Evaluation and Tracking for LLM Experiments

环境准备

ollama本地模型支持

trulens支持的provider：www.trulens.org/reference/t… r/

对ollama的本地模型支持方法 github.com/truera/trul… ls/ollama_quickstart.ipynb

使用litellm支持ollama本地部署的大模型

开始

在这个例子以及后面的例子中，语料库我们都将使用百度百科关于aigc的知识：baike.baidu.co m/item/AIGC?fromModule=lemma_search-box

完整代码

vector_store = ChromaVectorStore(chroma_collection=chroma_collection,persist_dir="./chroma_langc hain_db")

storage_context = StorageContext.from_defaults(vector_store=vector_store)

#创建文档切割器

node_parser = SimpleNodeParser.from_defaults(chunk_size=500,chunk_overlap=50)

#创建BAAI的embedding

embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-zh-v1.5") #创建index

base_index = VectorStoreIndex.from_documents(documents=docs,transformations= [node_parser],storage_context=storage_context, embed_model=embed_model)

return base_index,embed_model

def get_llm():

#创建OpenAI的llm

#llm = OpenAI(model="gpt-3.5-turbo")

#通义千问 '''

from llama_index.llms.dashscope import DashScope, DashScopeGenerationModels llm = DashScope(model_name=DashScopeGenerationModels.QWEN_MAX)

'''

#ollama本地模型

llm = Ollama(model="qwen2:7b-instruct-q4_0", request_timeout=120.0)

#创建谷歌gemini的llm # llm = Gemini()

return llm

def retrieve_data(question): #创建检索器

base_retriever = base_index.as_retriever(similarity_top_k=2)

#检索相关文档

retrievals = base_retriever.retrieve( question

)

#print(retrievals)

#docs.llamaindex.ai/en/stable/e…

from llama_index.core.response.notebook_utils import display_source_node

for n in retrievals:

display_source_node(n, source_length=1500)

return retrievals

def generate_answer(question):

query_engine = base_index.as_query_engine()

#大语言模型的回答

response = query_engine.query( question

)

print(str(response))

return query_engine,response

question="艾伦•图灵的论文叫什么" docs=prepare_data() llm=get_llm()

base_index,embed_model=embedding_data(docs)

#通过设置来配置 llm,embedding等等 Settings.llm = llm

Settings.embed_model = embed_model

#Settings.node_parser = SentenceSplitter(chunk_size=512, chunk_overlap=20) Settings.num_output = 512

Settings.context_window = 3000

query_engine = base_index.as_query_engine()

from trulens.core import TruSession

#from trulens_eval import OpenAI as fOpenAI import nest_asyncio

import numpy as np

from trulens.apps.llamaindex import TruLlama from trulens.core import Feedback

import litellm

from trulens.providers.litellm import LiteLLM

from trulens.dashboard import run_dashboard

def prepare_tru(): #设置线程的并发执行

nest_asyncio.apply()

#初始化数据库，它用来存储prompt、reponse、中间结果等信息。

session = TruSession() session.reset_database()

return session

#定义一个provider用来执行反馈 #provider = fOpenAI()

def prepare_feedback():

litellm.set_verbose = False

provider = LiteLLM( model_engine="ollama/qwen2:7b-instruct-q4_0",

api_base="http://localhost:11434"

)

f_answer_relevance = Feedback(

provider.relevance_with_cot_reasons,#反馈函数 name="Answer Relevance"#面板标识

).on_input_output()

context_selection = TruLlama.select_context(query_engine)

f_context_relevance = ( Feedback(provider.context_relevance_with_cot_reasons, name="Context

Relevance")

.on_input()#用户查询

.on(context_selection)#检索结果

.aggregate(np.mean)#合计所有检索结果

)

f_groundedness = ( Feedback(

provider.groundedness_measure_with_cot_reasons, name="Groundedness"

)

.on(context_selection.collect()) # collect context chunks into a list

.on_output()

)

tru_recorder = TruLlama( app=query_engine, app_id="App_longe", feedbacks=[

f_context_relevance, f_answer_relevance, f_groundedness

]

)

面板查看评估结果

运行完整代码后默认下面地址：

可以本地打开dashboard页面查看评估结果：

编辑

Llama-Index评估

文档地址：Redirecting...