Agentic APP 技术栈总结 - llama-indexllamaIndex 文档 Blog: Blog

llamaIndex

文档

Blog: Blog — LlamaIndex - Build Knowledge Assistants over your Enterprise Data
首页：LlamaIndex - LlamaIndex
概念: High-Level Concepts - LlamaIndex
本地模型加载: Starter Tutorial (Local Models) - LlamaIndex
Learn: docs.llamaindex.ai/en/stable/u…
组件/模块: docs.llamaindex.ai/en/stable/m…
Tools: Tools - LlamaIndex
Examples: docs.llamaindex.ai/en/stable/e…
Use-Cases：LlamaIndex - LlamaIndex

Agents - LlamaIndex

如何自定义: Frequently Asked Questions (FAQ) - LlamaIndex
Build LLM APP : docs.llamaindex.ai/en/stable/u…
Llama Hub

安装

pip install llama-index

定义 Workflow

A Workflow in LlamaIndex is an event-driven abstraction used to chain together several events. Workflows are made up of steps, with each step responsible for handling certain event types and emitting new events.

Workflows in LlamaIndex work by decorating function with a @step decorator. This is used to infer the input and output types of each workflow for validation, and ensures each step only runs when an accepted event is ready.

You can create a Workflow to do anything! Build an agent, a RAG flow, an extraction flow, or anything else you want.

Helloworld

from llama_index.core.workflow import (
    StartEvent,
    StopEvent,
    Workflow,
    step,
)


class MyWorkflow(Workflow):
    @step
    async def my_step(self, ev: StartEvent) -> StopEvent:
        # do something here
        return StopEvent(result="Hello, world!")

import asyncio

async def main():
    w = MyWorkflow(timeout=10, verbose=False)
    result = await w.run()
    print(result)

if __name__ == "__main__":
    asyncio.run(main())

workflow 基本范式

OllmamaWorkflow

import asyncio
from llama_index.core.workflow import (
    StartEvent,
    StopEvent,
    Workflow,
    step,
)
from llama_index.llms.ollama import Ollama

class OllamaGenerator(Workflow):
    @step
    async def generate(self, ev: StartEvent) -> StopEvent:
        llm = Ollama(model="deepseek-r1:8b")
        response = await llm.acomplete(ev.query)
        return StopEvent(result=str(response))

async def ollama():
    w = OllamaGenerator(timeout=100, verbose=False)
    result = await w.run(query="Who are you?")
    print(result)

if __name__ == "__main__":
    asyncio.run(ollama())

为什么 event-driven

Other frameworks and LlamaIndex itself have attempted to solve this problem previously with directed acyclic graphs (DAGs) but these have a number of limitations that workflows do not:

Logic like loops and branches needed to be encoded into the edges of graphs, which made them hard to read and understand.
Passing data between nodes in a DAG created complexity around optional and default values and which parameters should be passed.
DAGs did not feel natural to developers trying to developing complex, looping, branching AI applications.

The event-based pattern and vanilla python approach of Workflows resolves these problems.

For simple RAG pipelines and linear demos we do not expect you will need Workflows, but as your application grows in complexity, we hope you will reach for them.

A key feature of Workflows is their enablement of branching and looping logic, more simply and flexibly than graph-based approaches. docs.llamaindex.ai/en/stable/u…

Examples

定义 Agent

Agents - LlamaIndex An "agent" is an automated reasoning and decision engine. It takes in a user input/query and can make internal decisions for executing that query in order to return the correct result. The key agent components can include, but are not limited to:

Breaking down a complex question into smaller ones
Choosing an external Tool to use + coming up with parameters for calling the Tool
Planning out a set of tasks
Storing previously completed tasks in a memory module

Agents - LlamaIndex Building a data agent requires the following core components:

A reasoning loop
Tool abstractions

The reasoning loop depends on the type of agent. We have support for the following agents:

Function Calling Agents (integrates with any function calling LLM)
ReAct agent (works across any chat/text completion endpoint).
"Advanced Agents": LLMCompiler, Chain-of-Abstraction, Language Agent Tree Search, and more.

构造 Agent

Building a basic agent - LlamaIndex

Examples

docs.llamaindex.ai/en/stable/e…

This notebook walks through setting up a Workflow to construct a ReAct agent from (mostly) scratch.

React calling agents work by prompting an LLM to either invoke tools/functions, or return a final response.

Our workflow will be stateful with memory, and will be able to call the LLM to select tools and process incoming user messages.

Agents - LlamaIndex

RAG

Introduction to RAG - LlamaIndex

curl http://localhost:11434/api/embed -d '{ "model": "bge-m3", "input": ["Why is the sky blue?", "Why is the grass green?"] }'

在 RAG 中，您的数据被加载并准备用于查询或“索引”。用户查询作用于索引，索引会将您的数据筛选到最相关的上下文。然后，此上下文和您的查询将连同提示一起转到 LLM，LLM 将提供响应。

In RAG, your data is loaded and prepared for queries or "indexed". User queries act on the index, which filters your data down to the most relevant context. This context and your query then go to the LLM along with a prompt, and the LLM provides a response.

RAG 各个阶段 Stages within RAG

RAG 中有 5 个关键阶段，这些阶段将成为您构建的大多数大型应用程序的一部分。这些是：

There are five key stages within RAG, which in turn will be a part of most larger applications you build. These are:

加载：这是指将数据从其所在位置（无论是文本文件、PDF、其他网站、数据库还是 API）导入您的工作流程。[llamaHub]（llamahub.ai/）提供数百种连接器供您选择。
索引：这意味着创建一个允许查询数据的数据结构。对于 LLM 来说，这几乎总是意味着创建“向量嵌入”、数据含义的数字表示，以及许多其他元数据策略，以便轻松准确地找到与上下文相关的数据。
存储：一旦数据被索引，您几乎总是希望存储索引以及其他元数据，以避免必须重新索引它。
查询：对于任何给定的索引策略，您都可以通过多种方式利用 LLM 和 LlamaIndex 数据结构进行查询，包括子查询、多步骤查询和混合策略。
评估：任何流程中的关键步骤是检查它相对于其他策略的有效性，或者当您进行更改时。评估提供了客观的衡量标准，用于衡量您对查询的响应的准确性、失败率和速度。

核心概念 Important concepts within RAG

每一个阶段还有一些术语：

Loading stage

Nodes and Documents: “文档”是任何数据源（例如 PDF、API 输出或从数据库中检索数据）的容器。“节点”是 LlamaIndex 中数据的原子单元，表示源“文档”的“块”。节点具有元数据，这些元数据将它们与它们所在的文档和其他节点相关联。

Connectors:数据连接器（通常称为“Reader”）将数据从不同的数据源和数据格式摄取到“Documents”和“Nodes”中。

Indexing Stage

Indexes: 提取数据后，LlamaIndex 将帮助您将数据索引到易于检索的结构中。这通常涉及生成“向量嵌入”，这些嵌入存储在称为“向量存储”的专用数据库中。索引还可以存储有关数据的各种元数据。

Embeddings: LLM 生成称为“嵌入”的数据的数字表示。在筛选数据相关性时，LlamaIndex 会将查询转换为嵌入向量，向量存储将查找在数值上与查询嵌入向量相似的数据。

Querying Stage#

Retrievers:检索器定义在给定查询时如何从索引中有效地检索相关上下文。检索策略是确保检索数据的相关性和完成检索效率的关键。

Routers: 路由器确定将使用哪个检索器从知识库中检索相关上下文。更具体地说，'RouterRetriever' 类负责选择一个或多个候选检索器来执行查询。他们使用选择器根据每个候选人的元数据和查询选择最佳选项。

Node Postprocessors:节点后处理器接收一组检索到的节点，并对它们应用转换、筛选或重新排序逻辑。

Response Synthesizers: 响应合成器使用用户查询和一组给定的检索到的文本块从 LLM 生成响应。

RAG 示例

基于一个 PDF 文档，构建一个 QAbot，使用 PDF 文档内容作为上下文进行回答。

Load 数据 llama-index 抽象了一个 Reader 的类，支持读取不同数据源和数据格式，生成 Document 对象。Document 包含了文档内容和 metadata，内容格式将会支持文本，音频，视频，图片。

SimpleDirectoryReader 一行代码则将指定目录下的文档生成相应的 Document 对象实例。SimpleDirectoryReader 类功能是遍历目录，将目录下的文档生成 Document 对象实例。

from llama_index.core import SimpleDirectoryReader

documents = SimpleDirectoryReader("./data").load_data()

DataaseReader 支持执行 sql 语句，从数据库读取数据，将读取到的数据生成 Document 对象实例。

from llama_index.core import download_loader

from llama_index.readers.database import DatabaseReader

reader = DatabaseReader(
    scheme=os.getenv("DB_SCHEME"),
    host=os.getenv("DB_HOST"),
    port=os.getenv("DB_PORT"),
    user=os.getenv("DB_USER"),
    password=os.getenv("DB_PASS"),
    dbname=os.getenv("DB_NAME"),
)

query = "SELECT * FROM users"
documents = reader.load_data(query=query)

2. 构建 Index 构建 Index 的策略有许多种，VectorStoreIndex 封装了最场景的向量化索引，使用 Embedding 模型将 document 对象实例向量化，支持语义查询与问题相关的数据。

from llama_index.core import VectorStoreIndex

index = VectorStoreIndex.from_documents(documents)

将 document 向量化之后，可以将向量数据持久化保存在向量数据库，后续查询则从向量数据库加载向量化后的文档数据。chromadb 是常见向量数据库之一，下面则是 llama-index 基于 chromadb 持久化向量数据。

import chromadb
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.vector_stores.chroma import ChromaVectorStore
from llama_index.core import StorageContext

from llama_index.embeddings.ollama import OllamaEmbedding

embed_model = OllamaEmbedding(model_name="bge-m3",
                              base_url="http://localhost:11434",
                              ollama_additional_kwargs={"mirostat": 0},)
                              
# load some documents
documents = SimpleDirectoryReader("./data").load_data()

# initialize client, setting path to save data
db = chromadb.PersistentClient(path="./chroma_db")

# create collection
chroma_collection = db.get_or_create_collection("quickstart")

# assign chroma as the vector_store to the context
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

# create your index
index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context
)

3. QA 内容生成 文档向量化并持久化存储后，则可以构造一个基础版的 QABot 。首先使用 ollama加载 LLM 大语言模型，调用 index 对象的 as_query_engine 方法生成基础的 BaseQueryEngine 实例。其中封装了从向量数据中检索相关数据，再与问题一起构成 prompt 传入 LLM 生成回答内容。

llm = Ollama(model="deepseek-r1:8b")
query_engine = index.as_query_engine(llm=llm)
response = query_engine.query(
    "Write an email to the user given their background information."
)
print(response)

Embeddings

docs.llamaindex.ai/en/stable/m… Embeddings are used in LlamaIndex to represent your documents using a sophisticated numerical representation. Embedding models take text as input, and return a long list of numbers used to capture the semantics of the text. These embedding models have been trained to represent text this way, and help enable many applications, including search!

At a high level, if a user asks a question about dogs, then the embedding for that question will be highly similar to text that talks about dogs.

When calculating the similarity between embeddings, there are many methods to use (dot product, cosine similarity, etc.). By default, LlamaIndex uses cosine similarity when comparing embeddings.

There are many embedding models to pick from. By default, LlamaIndex uses text-embedding-ada-002 from OpenAI. We also support any embedding model offered by Langchain here, as well as providing an easy to extend base class for implementing your own embeddings.

docs.llamaindex.ai/en/stable/m…

Graph RAG

docs.llamaindex.ai/en/stable/e…

docs.llamaindex.ai/en/stable/m…

GraphRAG (Graphs + Retrieval Augmented Generation) combines the strengths of Retrieval Augmented Generation (RAG) and Query-Focused Summarization (QFS) to effectively handle complex queries over large text datasets. While RAG excels in fetching precise information, it struggles with broader queries that require thematic understanding, a challenge that QFS addresses but cannot scale well. GraphRAG integrates these approaches to offer responsive and thorough querying capabilities across extensive, diverse text corpora.

This notebook provides guidance on constructing the GraphRAG pipeline using the LlamaIndex PropertyGraph abstractions.

NOTE: This is an approximate implementation of GraphRAG. We are currently developing a series of cookbooks that will detail the exact implementation of GraphRAG.

Examples

docs.llamaindex.ai/en/stable/e…