一、LangChain模型调用API概述
LangChain是一个用于开发由语言模型驱动的应用程序的框架,它提供了一系列API接口,使得开发者可以轻松地与各种语言模型进行交互。这些API接口不仅简化了模型调用的过程,还提供了诸如提示管理、记忆、代理等高级功能,帮助开发者构建更复杂、更智能的应用程序。
LangChain的模型调用API主要分为以下几个核心组件:
-
LLM(大语言模型)接口:提供与各种大语言模型(如OpenAI GPT、Hugging Face Transformers等)交互的统一接口。
-
提示模板(Prompt Template):管理和格式化向模型提供的提示文本。
-
输出解析器(Output Parser):处理和解析模型的输出结果。
-
链(Chain):将多个组件组合成一个工作流程。
-
记忆(Memory):在链的执行过程中保持状态和历史信息。
-
代理(Agent):使模型能够自主决策并调用外部工具。
通过这些组件,LangChain提供了一个灵活且强大的框架,使得开发者可以快速构建和部署基于语言模型的应用程序。
二、LLM接口设计与实现
2.1 LLM基类设计
LangChain的LLM接口是与各种大语言模型交互的基础,其设计基于面向对象的原则,提供了统一的抽象接口。LLM基类定义了与模型交互的基本方法和属性:
class LLM(ABC):
"""Base class for all language models."""
@abstractmethod
def _call(self, prompt: str, stop: Optional[List[str]] = None) -> str:
"""Run the LLM on the given prompt and return the output."""
pass
@property
@abstractmethod
def _llm_type(self) -> str:
"""Return type of llm."""
pass
def generate(self, prompts: List[str], stop: Optional[List[str]] = None) -> LLMResult:
"""Generate text from a list of prompts."""
results = []
for prompt in prompts:
text = self._call(prompt, stop=stop)
results.append(Generation(text=text))
return LLMResult(generations=[results])
def get_num_tokens(self, text: str) -> int:
"""Get the number of tokens in the text."""
# 默认实现,具体模型可能需要重写
return len(text.split())
# 其他辅助方法...
这个基类定义了与模型交互的核心方法:
-
_call方法:这是一个抽象方法,具体的LLM实现需要重写这个方法,实现与特定模型的交互。 -
_llm_type属性:返回LLM的类型,如"openai"、"huggingface"等。 -
generate方法:处理多个提示的生成,默认实现是循环调用_call方法。 -
get_num_tokens方法:计算文本中的令牌数量,默认实现是简单的字符串分割,具体模型可能需要重写这个方法。
2.2 具体LLM实现
基于LLM基类,LangChain为不同的模型提供商实现了具体的子类。以OpenAI LLM实现为例:
class OpenAI(LLM):
"""Wrapper around OpenAI large language models."""
openai_api_key: Optional[str] = None
model_name: str = "text-davinci-003"
temperature: float = 0.7
max_tokens: int = 256
top_p: float = 1
frequency_penalty: float = 0
presence_penalty: float = 0
n: int = 1
best_of: int = 1
request_timeout: Optional[Union[float, Tuple[float, float]]] = None
logit_bias: Optional[Dict[str, float]] = None
def __init__(self, **kwargs: Any):
"""Initialize the OpenAI LLM."""
super().__init__(**kwargs)
# 导入OpenAI库并进行初始化
try:
import openai
except ImportError:
raise ImportError(
"Could not import openai python package. "
"Please install it with `pip install openai`."
)
self.client = openai.Completion
if self.openai_api_key is None:
self.openai_api_key = os.environ.get("OPENAI_API_KEY")
if self.openai_api_key is None:
raise ValueError(
"Did not find OpenAI API key, please add an environment variable "
"`OPENAI_API_KEY` which contains it, or pass "
"`openai_api_key` as a named parameter."
)
@property
def _llm_type(self) -> str:
"""Return type of llm."""
return "openai"
def _call(self, prompt: str, stop: Optional[List[str]] = None) -> str:
"""Call the OpenAI API."""
try:
response = self.client.create(
model=self.model_name,
prompt=prompt,
temperature=self.temperature,
max_tokens=self.max_tokens,
top_p=self.top_p,
frequency_penalty=self.frequency_penalty,
presence_penalty=self.presence_penalty,
n=self.n,
best_of=self.best_of,
stop=stop,
timeout=self.request_timeout,
logit_bias=self.logit_bias or {},
)
return response.choices[0].text.strip()
except Exception as e:
raise ValueError(f"Error calling OpenAI API: {e}") from e
def get_num_tokens(self, text: str) -> int:
"""Get the number of tokens in the text."""
try:
import tiktoken
except ImportError:
return super().get_num_tokens(text)
# 使用tiktoken库计算OpenAI模型的令牌数量
encoding = tiktoken.encoding_for_model(self.model_name)
return len(encoding.encode(text))
这个实现展示了如何与OpenAI API进行交互:
-
初始化过程中,检查并获取API密钥,确保可以访问OpenAI服务。
-
_llm_type属性返回"openai",标识这是一个OpenAI模型。 -
_call方法实现了与OpenAI API的具体交互,处理参数设置和API调用。 -
get_num_tokens方法使用tiktoken库准确计算OpenAI模型的令牌数量。
2.3 LLM接口的扩展性
LLM接口的设计具有很强的扩展性,允许开发者轻松添加对新模型的支持。要添加一个新的LLM实现,只需继承LLM基类,并重写_call和_llm_type方法:
class CustomLLM(LLM):
"""Wrapper around a custom large language model."""
model_path: str
api_endpoint: str
def __init__(self, model_path: str, api_endpoint: str, **kwargs: Any):
"""Initialize the custom LLM."""
super().__init__(**kwargs)
self.model_path = model_path
self.api_endpoint = api_endpoint
# 初始化自定义模型
@property
def _llm_type(self) -> str:
"""Return type of llm."""
return "custom"
def _call(self, prompt: str, stop: Optional[List[str]] = None) -> str:
"""Call the custom LLM API."""
# 实现与自定义模型的交互
response = requests.post(
self.api_endpoint,
json={"prompt": prompt, "stop": stop},
headers={"Content-Type": "application/json"}
)
if response.status_code != 200:
raise ValueError(f"Error calling custom LLM API: {response.text}")
return response.json()["text"]
这种设计使得LangChain可以轻松支持各种语言模型,包括开源模型和私有部署的模型。
三、提示模板(Prompt Template)实现原理
3.1 提示模板的基本概念
提示模板是LangChain中的一个重要组件,用于管理和格式化向模型提供的提示文本。提示模板允许开发者定义带有变量的提示文本,并在运行时填充这些变量的值。
提示模板的基本接口定义如下:
class BasePromptTemplate(ABC):
"""Base class for all prompt templates."""
input_variables: List[str]
output_parser: Optional[BaseOutputParser] = None
@abstractmethod
def format(self, **kwargs: Any) -> str:
"""Format the prompt with the given parameters."""
pass
@abstractmethod
def format_prompt(self, **kwargs: Any) -> PromptValue:
"""Format the prompt with the given parameters and return a PromptValue."""
pass
@property
@abstractmethod
def _get_prompt_dict(self) -> Dict:
"""Return a dictionary of the prompt."""
pass
def validate_input_variables(self) -> None:
"""Validate that all input variables are present."""
# 检查输入变量是否合法
if self.input_variables is None:
raise ValueError("input_variables cannot be None")
for var in self.input_variables:
if not isinstance(var, str):
raise ValueError(f"Input variable {var} must be a string")
3.2 字符串提示模板实现
最常见的提示模板实现是基于字符串的模板,它使用Python的字符串格式化语法:
class PromptTemplate(BasePromptTemplate):
"""A prompt template for a language model."""
template: str
template_format: str = "f-string"
validate_template: bool = True
def __init__(
self,
input_variables: List[str],
template: str,
template_format: str = "f-string",
validate_template: bool = True,
**kwargs: Any
):
"""Initialize the prompt template."""
super().__init__(input_variables=input_variables, **kwargs)
self.template = template
self.template_format = template_format
self.validate_template = validate_template
if self.validate_template:
self._validate_template()
def _validate_template(self) -> None:
"""Validate that the template is valid."""
if self.template_format == "f-string":
# 验证f-string格式
try:
# 使用ast模块解析f-string
ast.parse(f'f"{self.template}"')
except SyntaxError as e:
raise ValueError(f"Invalid f-string template: {e}")
elif self.template_format == "jinja2":
# 验证Jinja2格式
try:
from jinja2 import Template
Template(self.template)
except Exception as e:
raise ValueError(f"Invalid Jinja2 template: {e}")
else:
raise ValueError(f"Unsupported template format: {self.template_format}")
def format(self, **kwargs: Any) -> str:
"""Format the prompt with the given parameters."""
# 检查所有必需的输入变量是否都已提供
missing_vars = set(self.input_variables) - set(kwargs.keys())
if missing_vars:
raise ValueError(f"Missing variables: {missing_vars}")
if self.template_format == "f-string":
# 使用f-string格式化
return eval(f'f"{self.template}"', kwargs)
elif self.template_format == "jinja2":
# 使用Jinja2格式化
from jinja2 import Template
return Template(self.template).render(**kwargs)
else:
raise ValueError(f"Unsupported template format: {self.template_format}")
def format_prompt(self, **kwargs: Any) -> PromptValue:
"""Format the prompt with the given parameters and return a PromptValue."""
text = self.format(**kwargs)
return StringPromptValue(text=text)
@property
def _get_prompt_dict(self) -> Dict:
"""Return a dictionary of the prompt."""
return {
"input_variables": self.input_variables,
"template": self.template,
"template_format": self.template_format,
}
这个实现支持两种常见的模板格式:f-string和Jinja2。它提供了以下功能:
-
模板验证:确保模板格式正确,避免运行时错误。
-
变量检查:确保所有必需的输入变量都已提供。
-
格式化:根据指定的模板格式,将变量值填充到模板中。
3.3 示例:使用提示模板
下面是一个使用提示模板的简单示例:
# 创建一个提示模板
prompt_template = PromptTemplate(
input_variables=["question", "context"],
template="""
Context: {context}
Question: {question}
Answer:
"""
)
# 填充变量
formatted_prompt = prompt_template.format(
question="What is the capital of France?",
context="France is a country in Western Europe."
)
# 输出格式化后的提示
print(formatted_prompt)
这个示例展示了如何创建一个简单的提示模板,并使用变量值填充它。提示模板的设计使得提示文本的管理更加灵活和可维护。
四、输出解析器(Output Parser)实现原理
4.1 输出解析器的作用
输出解析器是LangChain中的一个重要组件,用于处理和解析模型的输出结果。模型返回的原始文本可能需要进行格式化、提取特定信息或转换为特定的数据结构,输出解析器就是为了完成这些任务而设计的。
输出解析器的基本接口定义如下:
class BaseOutputParser(ABC):
"""Abstract base class for output parsers."""
@abstractmethod
def parse(self, text: str) -> Any:
"""Parse the output text and return a structured result."""
pass
@abstractmethod
def get_format_instructions(self) -> str:
"""Get format instructions for the output."""
pass
def __str__(self) -> str:
"""Return a string representation of the output parser."""
return self.__class__.__name__
4.2 常见输出解析器实现
LangChain提供了多种内置的输出解析器,以满足不同的需求。以下是一些常见的输出解析器实现:
4.2.1 逗号分隔值解析器
class CommaSeparatedListOutputParser(BaseOutputParser[List[str]]):
"""Parse the output of an LLM call to a comma-separated list."""
def parse(self, text: str) -> List[str]:
"""Parse the output text as a comma-separated list."""
return [item.strip() for item in text.split(",")]
def get_format_instructions(self) -> str:
"""Get format instructions for the output."""
return "Your answer should be a list of comma-separated values."
4.2.2 JSON解析器
class StructuredOutputParser(BaseOutputParser):
"""Parse the output of an LLM call to a structured object."""
schema: Dict[str, Any]
def __init__(self, schema: Dict[str, Any]):
"""Initialize the parser with a schema."""
self.schema = schema
def parse(self, text: str) -> Dict[str, Any]:
"""Parse the output text as a JSON object."""
try:
return json.loads(text)
except json.JSONDecodeError as e:
raise ValueError(f"Failed to parse JSON: {e}") from e
def get_format_instructions(self) -> str:
"""Get format instructions for the output."""
schema_str = json.dumps(self.schema, indent=2)
return f"""
Your answer should be a JSON object formatted according to the following schema:
{schema_str}
Example:
{{
"name": "John Doe",
"age": 30,
"hobbies": ["reading", "swimming"]
}}
"""
4.2.3 正则表达式解析器
class RegexParser(BaseOutputParser):
"""Parse the output using a regular expression."""
regex: str
output_keys: List[str]
default_value: Optional[Any] = None
def __init__(self, regex: str, output_keys: List[str], default_value: Optional[Any] = None):
"""Initialize the parser with a regex and output keys."""
self.regex = regex
self.output_keys = output_keys
self.default_value = default_value
def parse(self, text: str) -> Dict[str, Any]:
"""Parse the output text using the regex."""
match = re.search(self.regex, text)
if not match:
if self.default_value is not None:
return {key: self.default_value for key in self.output_keys}
raise ValueError(f"Could not parse output with regex: {self.regex}")
groups = match.groups()
if len(groups) != len(self.output_keys):
raise ValueError(f"Regex produced {len(groups)} groups, but expected {len(self.output_keys)}")
return {key: value for key, value in zip(self.output_keys, groups)}
def get_format_instructions(self) -> str:
"""Get format instructions for the output."""
return f"Your answer should match the regex: {self.regex}"
4.3 输出解析器与提示模板的结合
输出解析器通常与提示模板结合使用,以确保模型生成的输出符合预期格式。以下是一个结合使用的示例:
# 定义一个结构化输出解析器
parser = StructuredOutputParser(
schema={
"answer": "The answer to the user's question",
"confidence": "A number between 0 and 1 indicating the confidence in the answer"
}
)
# 创建一个提示模板,包含输出格式说明
prompt_template = PromptTemplate(
input_variables=["question"],
template="""
Answer the following question as best you can.
{format_instructions}
Question: {question}
""",
partial_variables={"format_instructions": parser.get_format_instructions()}
)
# 格式化提示
formatted_prompt = prompt_template.format(question="What is the capital of France?")
# 调用模型
model = OpenAI()
output = model(formatted_prompt)
# 解析输出
result = parser.parse(output)
print(result) # 输出: {"answer": "Paris", "confidence": 0.9}
这个示例展示了如何将输出解析器与提示模板结合使用,以指导模型生成特定格式的输出,并将其解析为结构化数据。
五、链(Chain)的设计与实现
5.1 链的基本概念
链是LangChain中的核心组件之一,它允许将多个组件组合成一个工作流程。链可以包含提示模板、LLM、输出解析器等组件,并定义它们之间的交互方式。
链的基本接口定义如下:
class Chain(ABC, Serializable):
"""Base class for all chains."""
@property
@abstractmethod
def input_keys(self) -> List[str]:
"""Return the input keys this chain expects."""
pass
@property
@abstractmethod
def output_keys(self) -> List[str]:
"""Return the output keys this chain produces."""
pass
@abstractmethod
def _call(self, inputs: Dict[str, Any]) -> Dict[str, Any]:
"""Run the core logic of this chain."""
pass
def call(self, inputs: Dict[str, Any], return_only_outputs: bool = False) -> Dict[str, Any]:
"""Run the chain as text in and text out."""
# 预处理输入
inputs = self.prep_inputs(inputs)
# 调用链的核心逻辑
outputs = self._call(inputs)
# 后处理输出
return self.prep_outputs(inputs, outputs, return_only_outputs)
def prep_inputs(self, inputs: Union[Dict[str, Any], Any]) -> Dict[str, Any]:
"""Prepare inputs before calling the chain."""
# 处理输入格式
if not isinstance(inputs, dict):
return {self.input_keys[0]: inputs}
else:
return inputs
def prep_outputs(
self, inputs: Dict[str, Any], outputs: Dict[str, Any], return_only_outputs: bool = False
) -> Dict[str, Any]:
"""Prepare outputs after calling the chain."""
# 处理输出格式
if return_only_outputs:
return outputs
else:
return {**inputs, **outputs}
def run(self, *args: Any, **kwargs: Any) -> str:
"""Run the chain as text in and text out."""
# 处理位置参数
if len(args) == 1 and len(kwargs) == 0 and isinstance(args[0], dict):
inputs = args[0]
elif len(args) == 0:
inputs = kwargs
else:
raise ValueError(
"`run` supports either a single dictionary of "
"arguments or keyword arguments. Got both."
)
# 调用链
outputs = self(inputs)
# 确保只有一个输出键
if len(self.output_keys) != 1:
raise ValueError(f"`run` not supported when there is not exactly "
f"one output key. Got {self.output_keys}.")
return outputs[self.output_keys[0]]
5.2 简单链实现示例
下面是一个简单的链实现示例,它将提示模板、LLM和输出解析器组合在一起:
class SimpleSequentialChain(Chain):
"""A simple sequential chain that runs multiple chains in sequence."""
chains: List[Chain]
input_key: str = "input"
output_key: str = "output"
@property
def input_keys(self) -> List[str]:
"""Return the input keys this chain expects."""
return [self.input_key]
@property
def output_keys(self) -> List[str]:
"""Return the output keys this chain produces."""
return [self.output_key]
def _call(self, inputs: Dict[str, Any]) -> Dict[str, Any]:
"""Run the chains in sequence."""
current_input = inputs
for chain in self.chains:
# 确保当前输入包含链所需的所有输入键
for key in chain.input_keys:
if key not in current_input:
raise ValueError(f"Missing input key: {key}")
# 调用链
current_output = chain(current_input)
# 更新当前输入为链的输出
current_input = current_output
return current_input
5.3 常见链类型
LangChain提供了多种内置的链类型,以满足不同的需求:
5.3.1 LLMChain
LLMChain是最基本的链类型,它将提示模板和LLM组合在一起:
class LLMChain(Chain):
"""Chain to run queries against LLMs."""
prompt: BasePromptTemplate
llm: LLM
output_key: str = "text" #: :meta private:
@property
def input_keys(self) -> List[str]:
"""Will be whatever keys the prompt expects.
:meta private:
"""
return self.prompt.input_variables
@property
def output_keys(self) -> List[str]:
"""Will always return text key.
:meta private:
"""
return [self.output_key]
def _call(self, inputs: Dict[str, Any]) -> Dict[str, str]:
"""Format prompt with kwargs and pass to LLM."""
prompt_value = self.prompt.format_prompt(**inputs)
# Get the LLM to generate
response = self.llm.generate_prompt([prompt_value])
# Format the response
text = self._process_response(response)
return {self.output_key: text}
def _process_response(self, response: LLMResult) -> str:
"""Process response from LLM."""
return response.generations[0][0].text
5.3.2 SequentialChain
SequentialChain允许按顺序运行多个链,前一个链的输出作为后一个链的输入:
class SequentialChain(Chain):
"""A sequential chain that runs multiple chains in sequence."""
chains: List[Chain]
input_variables: List[str]
output_variables: List[str]
verbose: bool = False
@property
def input_keys(self) -> List[str]:
"""Return the input keys this chain expects."""
return self.input_variables
@property
def output_keys(self) -> List[str]:
"""Return the output keys this chain produces."""
return self.output_variables
def _call(self, inputs: Dict[str, Any]) -> Dict[str, Any]:
"""Run the chains in sequence."""
current_input = inputs.copy()
for chain in self.chains:
# 确保当前输入包含链所需的所有输入键
for key in chain.input_keys:
if key not in current_input:
raise ValueError(f"Missing input key: {key}")
# 调用链
if self.verbose:
print(f"\n\nRunning chain: {chain.__class__.__name__}")
print(f"Input: {current_input}")
current_output = chain(current_input)
if self.verbose:
print(f"Output: {current_output}")
# 更新当前输入为链的输出
current_input.update(current_output)
# 只返回指定的输出变量
return {key: current_input[key] for key in self.output_variables}
六、记忆(Memory)组件实现
6.1 记忆的作用
在LangChain中,记忆组件用于在链的执行过程中保持状态和历史信息。这对于需要上下文感知的应用程序(如聊天机器人)尤为重要。
记忆组件的基本接口定义如下:
class BaseMemory(ABC):
"""Base interface for memory in chains."""
@property
@abstractmethod
def memory_variables(self) -> List[str]:
"""Input keys this memory class will load dynamically."""
pass
@abstractmethod
def load_memory_variables(self, inputs: Dict[str, Any]) -> Dict[str, Any]:
"""Return key-value pairs given the text input to the chain."""
pass
@abstractmethod
def save_context(self, inputs: Dict[str, Any], outputs: Dict[str, Any]) -> None:
"""Save the context of this chain run to memory."""
pass
@abstractmethod
def clear(self) -> None:
"""Clear memory contents."""
pass
6.2 常见记忆实现
LangChain提供了多种记忆实现,以满足不同的需求:
6.2.1 简单聊天记忆
class ConversationBufferMemory(BaseMemory):
"""Buffer for storing conversation memory."""
memory_key: str = "history"
input_key: Optional[str] = None
output_key: Optional[str] = None
human_prefix: str = "Human"
ai_prefix: str = "AI"
chat_memory: BaseChatMemory = Field(default_factory=ChatMessageHistory)
@property
def memory_variables(self) -> List[str]:
"""Will always return list of memory variables.
:meta private:
"""
return [self.memory_key]
def load_memory_variables(self, inputs: Dict[str, Any]) -> Dict[str, str]:
"""Return history buffer."""
messages = self.chat_memory.messages
if self.input_key is not None and self.input_key in inputs:
# 如果提供了输入键,并且输入中包含该键,则将其添加到历史记录中
human_message = HumanMessage(content=inputs[self.input_key])
messages.append(human_message)
history = self._get_buffer_string(messages)
return {self.memory_key: history}
def save_context(self, inputs: Dict[str, Any], outputs: Dict[str, str]) -> None:
"""Save context from this conversation to buffer."""
if self.input_key is None:
input_str = list(inputs.values())[0]
else:
input_str = inputs[self.input_key]
if self.output_key is None:
output_str = list(outputs.values())[0]
else:
output_str = outputs[self.output_key]
self.chat_memory.add_user_message(input_str)
self.chat_memory.add_ai_message(output_str)
def clear(self) -> None:
"""Clear memory contents."""
self.chat_memory.clear()
def _get_buffer_string(self, messages: List[BaseMessage]) -> str:
"""Get buffer string of messages."""
string_messages = []
for m in messages:
if isinstance(m, HumanMessage):
role = self.human_prefix
elif isinstance(m, AIMessage):
role = self.ai_prefix
else:
continue
string_messages.append(f"{role}: {m.content}")
return "\n".join(string_messages)
6.2.2 总结记忆
class ConversationSummaryMemory(BaseMemory):
"""Memory that summarizes previous messages and uses that as the context."""
memory_key: str = "history"
llm: LLM
input_key: Optional[str] = None
output_key: Optional[str] = None
human_prefix: str = "Human"
ai_prefix: str = "AI"
chat_memory: BaseChatMemory = Field(default_factory=ChatMessageHistory)
summary: str = ""
@property
def memory_variables(self) -> List[str]:
"""Will always return list of memory variables.
:meta private:
"""
return [self.memory_key]
def load_memory_variables(self, inputs: Dict[str, Any]) -> Dict[str, str]:
"""Return history buffer."""
return {self.memory_key: self.summary}
def save_context(self, inputs: Dict[str, Any], outputs: Dict[str, str]) -> None:
"""Save context from this conversation to buffer."""
if self.input_key is None:
input_str = list(inputs.values())[0]
else:
input_str = inputs[self.input_key]
if self.output_key is None:
output_str = list(outputs.values())[0]
else:
output_str = outputs[self.output_key]
# 添加新的对话到聊天记忆
self.chat_memory.add_user_message(input_str)
self.chat_memory.add_ai_message(output_str)
# 更新总结
self._update_summary()
def clear(self) -> None:
"""Clear memory contents."""
self.chat_memory.clear()
self.summary = ""
def _update_summary(self) -> None:
"""Update the summary based on the current conversation."""
messages = self.chat_memory.messages
if not messages:
return
# 创建一个提示来总结对话
prompt_template = """
Summarize the following conversation:
{conversation}
Summary:
"""
prompt = PromptTemplate(
input_variables=["conversation"],
template=prompt_template
)
# 格式化对话
conversation_str = self._get_buffer_string(messages)
# 生成总结
chain = LLMChain(llm=self.llm, prompt=prompt)
self.summary = chain.run(conversation=conversation_str)
def _get_buffer_string(self, messages: List[BaseMessage]) -> str:
"""Get buffer string of messages."""
string_messages = []
for m in messages:
if isinstance(m, HumanMessage):
role = self.human_prefix
elif isinstance(m, AIMessage):
role = self.ai_prefix
else:
continue
string_messages.append(f"{role}: {m.content}")
return "\n".join(string_messages)
七、代理(Agent)系统实现
7.1 代理的基本概念
代理是LangChain中最复杂的组件之一,它允许模型自主决策并调用外部工具。代理系统能够根据输入动态选择合适的工具,并使用这些工具来解决问题。
代理的基本接口定义如下:
class Agent(ABC):
"""Base class for all agents."""
@property
@abstractmethod
def input_keys(self) -> List[str]:
"""Return the input keys this agent expects."""
pass
@property
@abstractmethod
def output_keys(self) -> List[str]:
"""Return the output keys this agent produces."""
pass
@abstractmethod
def _get_next_action(self, inputs: Dict[str, Any]) -> AgentAction:
"""Get the next action the agent should take."""
pass
@abstractmethod
def _construct_scratchpad(
self, intermediate_steps: List[Tuple[AgentAction, str]]
) -> str:
"""Construct the scratchpad that lets the agent continue its thought process."""
pass
@classmethod
@abstractmethod
def _get_default_output_parser(cls, **kwargs: Any) -> BaseOutputParser:
"""Get default output parser for this agent."""
pass
@classmethod
@abstractmethod
def create_prompt(cls, tools: List[BaseTool], **kwargs: Any) -> BasePromptTemplate:
"""Create a prompt for this agent."""
pass
@classmethod
@abstractmethod
def _validate_tools(cls, tools: List[BaseTool]) -> None:
"""Validate that appropriate tools are passed in."""
pass
def plan(
self, intermediate_steps: List[Tuple[AgentAction, str]], **kwargs: Any
) -> Union[AgentAction, AgentFinish]:
"""Given input, decided what to do.
Args:
intermediate_steps: Steps the agent has taken to date,
along with observations
**kwargs: User inputs.
Returns:
Action specifying what tool to use.
"""
# 构建输入字典
full_inputs = kwargs.copy()
full_inputs["intermediate_steps"] = intermediate_steps
# 获取下一步行动
action = self._get_next_action(full_inputs)
# 如果行动是结束,则返回AgentFinish
if isinstance(action, AgentFinish):
return action
# 否则返回AgentAction
return action
7.2 工具接口
代理系统使用工具来执行特定的任务。工具的基本接口定义如下:
class BaseTool(ABC):
"""Base class for all tools."""
name: str
description: str
verbose: bool = False
@abstractmethod
def _run(self, *args: Any, **kwargs: Any) -> str:
"""Use the tool."""
pass
async def _arun(self, *args: Any, **kwargs: Any) -> str:
"""Use the tool asynchronously."""
raise NotImplementedError("Tool does not support async")
def run(self, *args: Any, **kwargs: Any) -> str:
"""Run the tool."""
if self.verbose:
print(f"Tool {self.name} called with args: {args}, kwargs: {kwargs}")
# 执行工具
output = self._run(*args, **kwargs)
if self.verbose:
print(f"Tool {self.name} returned: {output}")
return output
async def arun(self, *args: Any, **kwargs: Any) -> str:
"""Run the tool asynchronously."""
if self.verbose:
print(f"Tool {self.name} called with args: {args}, kwargs: {kwargs}")
# 执行工具
output = await self._arun(*args, **kwargs)
if self.verbose:
print(f"Tool {self.name} returned: {output}")
return output
7.3 代理实现示例
下面是一个简单的代理实现示例,它使用ReAct框架来决定何时调用工具:
class ReActAgent(Agent):
"""Agent that uses ReAct framework to decide when to use tools."""
llm_chain: LLMChain
allowed_tools: List[str]
output_parser: BaseOutputParser
@property
def input_keys(self) -> List[str]:
"""Return the input keys this agent expects."""
return ["input"]
@property
def output_keys(self) -> List[str]:
"""Return the output keys this agent produces."""
return ["output"]
def _get_next_action(self, inputs: Dict[str, Any]) -> AgentAction:
"""Get the next action the agent should take."""
# 构建思考过程
thoughts = self._construct_scratchpad(inputs["intermediate_steps"])
# 添加思考过程到输入
full_inputs = {**inputs, "agent_scratchpad": thoughts}
# 调用LLM获取下一步行动
output = self.llm_chain.predict(**full_inputs)
# 解析输出
return self.output_parser.parse(output)
def _construct_scratchpad(
self, intermediate_steps: List[Tuple[AgentAction, str]]
) -> str:
"""Construct the scratchpad that lets the agent continue its thought process."""
thoughts = ""
for action, observation in intermediate_steps:
thoughts += f"Action: {action.tool}\nAction Input: {action.tool_input}\nObservation: {observation}\n"
return thoughts
@classmethod
def _get_default_output_parser(cls, **kwargs: Any) -> BaseOutputParser:
"""Get default output parser for this agent."""
return ReActOutputParser()
@classmethod
def create_prompt(cls, tools: List[BaseTool], **kwargs: Any) -> BasePromptTemplate:
"""Create a prompt for this agent."""
tool_descriptions = "\n".join([f"{tool.name}: {tool.description}" for tool in tools])
tool_names = ", ".join([tool.name for tool in tools])
template = f"""
Answer the following questions as best you can. You have access to the following tools:
{tool_descriptions}
Use the following format:
Question: the input question you must answer
Thought: you should always think about what to do
Action: the name of the tool to use, should be one of [{tool_names}]
Action Input: the input to the tool
Observation: the result of the tool
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original question
Begin!
Question: {{input}}
Thought: {{agent_scratchpad}}
"""
return PromptTemplate(
input_variables=["input", "agent_scratchpad"],
template=template
)
@classmethod
def _validate_tools(cls, tools: List[BaseTool]) -> None:
"""Validate that appropriate tools are passed in."""
if not tools:
raise ValueError("At least one tool must be provided.")
八、异步API实现
8.1 异步编程模型
LangChain支持异步API,这对于需要处理大量并发请求的应用程序非常有用。异步API使用Python的async/await语法,允许非阻塞地执行长时间运行的操作。
LangChain的异步API主要基于以下几个原则:
- 一致的接口:异步API与同步API保持一致的接口设计,方便开发者在不同场景下切换使用。
LangChain的异步API主要基于以下几个原则:
-
一致的接口:异步API与同步API保持一致的接口设计,方便开发者在不同场景下切换使用。
-
非阻塞操作:所有可能阻塞的操作都提供异步版本,确保在处理多个请求时不会阻塞整个应用程序。
-
并行处理:支持并行执行多个请求,提高整体吞吐量。
8.2 异步LLM接口
LLM基类提供了异步方法的定义,具体实现由各个子类完成:
class LLM(ABC):
"""Base class for all language models."""
# 同步方法
@abstractmethod
def _call(self, prompt: str, stop: Optional[List[str]] = None) -> str:
pass
# 异步方法
@abstractmethod
async def _acall(self, prompt: str, stop: Optional[List[str]] = None) -> str:
"""Run the LLM on the given prompt asynchronously."""
pass
# 同步批量生成
def generate(self, prompts: List[str], stop: Optional[List[str]] = None) -> LLMResult:
results = []
for prompt in prompts:
text = self._call(prompt, stop=stop)
results.append(Generation(text=text))
return LLMResult(generations=[results])
# 异步批量生成
async def agenerate(self, prompts: List[str], stop: Optional[List[str]] = None) -> LLMResult:
"""Generate text from a list of prompts asynchronously."""
results = []
for prompt in prompts:
text = await self._acall(prompt, stop=stop)
results.append(Generation(text=text))
return LLMResult(generations=[results])
# 其他同步和异步方法...
以OpenAI LLM实现为例,看一下异步方法的具体实现:
class OpenAI(LLM):
"""Wrapper around OpenAI large language models."""
# 其他属性和方法...
async def _acall(self, prompt: str, stop: Optional[List[str]] = None) -> str:
"""Call the OpenAI API asynchronously."""
try:
# 使用aiohttp进行异步请求
import aiohttp
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {self.openai_api_key}"
}
payload = {
"model": self.model_name,
"prompt": prompt,
"temperature": self.temperature,
"max_tokens": self.max_tokens,
"top_p": self.top_p,
"frequency_penalty": self.frequency_penalty,
"presence_penalty": self.presence_penalty,
"n": self.n,
"stop": stop,
}
async with aiohttp.ClientSession() as session:
async with session.post(
"https://api.openai.com/v1/completions",
headers=headers,
json=payload,
timeout=self.request_timeout
) as response:
response_data = await response.json()
if response.status != 200:
raise ValueError(f"Error calling OpenAI API: {response_data}")
return response_data["choices"][0]["text"].strip()
except Exception as e:
raise ValueError(f"Error calling OpenAI API: {e}") from e
8.3 异步链实现
链也提供了异步版本的实现,允许异步执行整个工作流程:
class Chain(ABC, Serializable):
"""Base class for all chains."""
# 同步调用
def call(self, inputs: Dict[str, Any], return_only_outputs: bool = False) -> Dict[str, Any]:
inputs = self.prep_inputs(inputs)
outputs = self._call(inputs)
return self.prep_outputs(inputs, outputs, return_only_outputs)
# 异步调用
async def acall(self, inputs: Dict[str, Any], return_only_outputs: bool = False) -> Dict[str, Any]:
"""Run the chain asynchronously."""
inputs = self.prep_inputs(inputs)
outputs = await self._acall(inputs)
return self.prep_outputs(inputs, outputs, return_only_outputs)
@abstractmethod
def _call(self, inputs: Dict[str, Any]) -> Dict[str, Any]:
"""Run the core logic of this chain synchronously."""
pass
@abstractmethod
async def _acall(self, inputs: Dict[str, Any]) -> Dict[str, Any]:
"""Run the core logic of this chain asynchronously."""
pass
# 其他同步和异步方法...
以LLMChain为例,看一下异步方法的具体实现:
class LLMChain(Chain):
"""Chain to run queries against LLMs."""
# 其他属性和方法...
async def _acall(self, inputs: Dict[str, Any]) -> Dict[str, str]:
"""Format prompt with kwargs and pass to LLM asynchronously."""
prompt_value = self.prompt.format_prompt(**inputs)
# 异步调用LLM
response = await self.llm.agenerate_prompt([prompt_value])
# 处理响应
text = self._process_response(response)
return {self.output_key: text}
8.4 并行处理多个请求
LangChain提供了方便的工具来并行处理多个请求,进一步提高性能:
async def aload_qa_with_sources_chain(
llm: LLM,
chain_type: str = "stuff",
verbose: Optional[bool] = None,
**kwargs: Any
) -> BaseChain:
"""Load question answering chain with sources."""
# 根据链类型选择合适的链
if chain_type == "stuff":
return StuffDocumentsChain(
llm_chain=LLMChain(llm=llm, prompt=load_qa_with_sources_prompt()),
document_variable_name="summaries",
verbose=verbose,
)
elif chain_type == "map_reduce":
return MapReduceDocumentsChain(
llm_chain=LLMChain(llm=llm, prompt=load_qa_with_sources_prompt()),
combine_document_chain=LLMChain(
llm=llm, prompt=load_qa_combine_prompt()
),
document_variable_name="summaries",
verbose=verbose,
)
elif chain_type == "refine":
return RefineDocumentsChain(
llm_chain=LLMChain(llm=llm, prompt=load_qa_prompt()),
refine_llm_chain=LLMChain(
llm=llm, prompt=load_qa_refine_prompt()
),
document_variable_name="document",
initial_response_name="existing_answer",
verbose=verbose,
)
else:
raise ValueError(f"Invalid chain type: {chain_type}")
async def aget_qa_with_sources_answers(
chain: BaseChain,
questions: List[str],
documents: List[Document],
batch_size: int = 2,
**kwargs: Any
) -> List[str]:
"""Get answers to a list of questions with sources asynchronously."""
# 并行处理多个问题
async def process_question(question: str) -> str:
inputs = {"question": question, "documents": documents}
result = await chain.acall(inputs, **kwargs)
return result["output_text"]
# 使用asyncio.gather并行处理多个问题
return await asyncio.gather(*[process_question(q) for q in questions])
九、流式API实现
9.1 流式响应的基本概念
流式响应允许模型在生成内容的同时逐步返回结果,而不是等待整个响应完成后才返回。这对于需要实时显示结果的应用程序(如聊天界面)非常有用。
LangChain提供了对流式响应的支持,主要通过以下几个组件实现:
-
BaseCallbackHandler:回调处理器基类,定义了处理流式响应的接口。
-
StreamingStdOutCallbackHandler:标准输出流式回调处理器,将流式响应输出到控制台。
-
LLM:LLM基类提供了流式响应的接口定义。
9.2 流式回调处理器
回调处理器基类定义了处理流式响应的接口:
class BaseCallbackHandler(ABC):
"""Base callback handler that can be used to handle callbacks from langchain."""
@abstractmethod
def on_llm_start(
self, serialized: Dict[str, Any], prompts: List[str], **kwargs: Any
) -> Any:
"""Run when LLM starts running."""
pass
@abstractmethod
def on_llm_new_token(self, token: str, **kwargs: Any) -> Any:
"""Run on new LLM token. Only available when streaming is enabled."""
pass
@abstractmethod
def on_llm_end(self, response: LLMResult, **kwargs: Any) -> Any:
"""Run when LLM ends running."""
pass
@abstractmethod
def on_llm_error(
self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any
) -> Any:
"""Run when LLM errors."""
pass
# 其他回调方法...
标准输出流式回调处理器实现了这些接口,将流式响应输出到控制台:
class StreamingStdOutCallbackHandler(BaseCallbackHandler):
"""Callback handler for streaming LLM responses to stdout."""
def on_llm_start(
self, serialized: Dict[str, Any], prompts: List[str], **kwargs: Any
) -> None:
"""Do nothing."""
pass
def on_llm_new_token(self, token: str, **kwargs: Any) -> None:
"""Print the token."""
print(token, end="", flush=True)
def on_llm_end(self, response: LLMResult, **kwargs: Any) -> None:
"""Do nothing."""
pass
def on_llm_error(
self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any
) -> None:
"""Do nothing."""
pass
9.3 流式LLM接口
LLM基类提供了流式响应的接口定义:
class LLM(ABC):
"""Base class for all language models."""
# 同步流式方法
def stream(self, prompt: str, stop: Optional[List[str]] = None) -> Iterator[str]:
"""Stream the response one token at a time."""
# 默认实现,具体模型可能需要重写
response = self._call(prompt, stop=stop)
for token in response.split():
yield token
# 异步流式方法
async def astream(self, prompt: str, stop: Optional[List[str]] = None) -> AsyncIterator[str]:
"""Stream the response one token at a time asynchronously."""
# 默认实现,具体模型可能需要重写
response = await self._acall(prompt, stop=stop)
for token in response.split():
yield token
# 其他方法...
以OpenAI LLM实现为例,看一下流式方法的具体实现:
class OpenAI(LLM):
"""Wrapper around OpenAI large language models."""
# 其他属性和方法...
def stream(self, prompt: str, stop: Optional[List[str]] = None) -> Iterator[str]:
"""Call the OpenAI API with streaming and yield tokens."""
try:
import openai
response = openai.Completion.create(
model=self.model_name,
prompt=prompt,
temperature=self.temperature,
max_tokens=self.max_tokens,
top_p=self.top_p,
frequency_penalty=self.frequency_penalty,
presence_penalty=self.presence_penalty,
n=self.n,
stop=stop,
stream=True,
)
for chunk in response:
if "choices" in chunk and len(chunk["choices"]) > 0:
if "text" in chunk["choices"][0]:
yield chunk["choices"][0]["text"]
except Exception as e:
raise ValueError(f"Error calling OpenAI API: {e}") from e
async def astream(self, prompt: str, stop: Optional[List[str]] = None) -> AsyncIterator[str]:
"""Call the OpenAI API with streaming asynchronously and yield tokens."""
try:
import aiohttp
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {self.openai_api_key}"
}
payload = {
"model": self.model_name,
"prompt": prompt,
"temperature": self.temperature,
"max_tokens": self.max_tokens,
"top_p": self.top_p,
"frequency_penalty": self.frequency_penalty,
"presence_penalty": self.presence_penalty,
"n": self.n,
"stop": stop,
"stream": True,
}
async with aiohttp.ClientSession() as session:
async with session.post(
"https://api.openai.com/v1/completions",
headers=headers,
json=payload,
timeout=self.request_timeout
) as response:
async for line in response.content:
line = line.decode("utf-8").strip()
if line.startswith("data: ") and not line.startswith("data: [DONE]"):
data = json.loads(line[6:])
if "choices" in data and len(data["choices"]) > 0:
if "text" in data["choices"][0]:
yield data["choices"][0]["text"]
except Exception as e:
raise ValueError(f"Error calling OpenAI API: {e}") from e
9.4 使用流式API
下面是一个使用流式API的简单示例:
# 创建一个OpenAI实例
llm = OpenAI(streaming=True, callbacks=[StreamingStdOutCallbackHandler()])
# 使用流式API生成响应
prompt = "Once upon a time"
for token in llm.stream(prompt):
# 处理每个生成的token
pass
# 异步使用流式API
async def generate_async():
async for token in llm.astream(prompt):
# 处理每个生成的token
pass
# 运行异步生成
asyncio.run(generate_async())
十、API集成与扩展
10.1 与其他API的集成
LangChain可以轻松地与其他API集成,扩展其功能。例如,与向量数据库、搜索引擎等集成:
# 与向量数据库集成
from langchain.vectorstores import FAISS
from langchain.embeddings import OpenAIEmbeddings
# 创建向量数据库
embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_texts(["Hello world", "Goodbye world"], embeddings)
# 创建检索链
from langchain.chains import RetrievalQA
from langchain.llms import OpenAI
llm = OpenAI()
qa_chain = RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff",
retriever=vectorstore.as_retriever()
)
# 使用检索链回答问题
result = qa_chain.run("What did you say?")
print(result)
# 与搜索引擎集成
from langchain.utilities import SerpAPIWrapper
from langchain.agents import initialize_agent, Tool
# 创建搜索引擎工具
search = SerpAPIWrapper()
tools = [
Tool(
name="Search",
func=search.run,
description="Useful for finding information on the internet"
)
]
# 创建代理
agent = initialize_agent(
tools,
llm,
agent="zero-shot-react-description",
verbose=True
)
# 使用代理回答问题
result = agent.run("What is the capital of France?")
print(result)
10.2 自定义组件开发
LangChain的设计非常灵活,允许开发者自定义各种组件。例如,开发自定义LLM、提示模板、输出解析器等:
# 开发自定义LLM
class CustomLLM(LLM):
"""Custom LLM wrapper."""
api_endpoint: str
api_key: str
def __init__(self, api_endpoint: str, api_key: str, **kwargs: Any):
"""Initialize the custom LLM."""
super().__init__(**kwargs)
self.api_endpoint = api_endpoint
self.api_key = api_key
@property
def _llm_type(self) -> str:
"""Return type of llm."""
return "custom"
def _call(self, prompt: str, stop: Optional[List[str]] = None) -> str:
"""Call the custom LLM API."""
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {self.api_key}"
}
payload = {
"prompt": prompt,
"max_tokens": 256,
"temperature": 0.7,
"stop": stop
}
response = requests.post(self.api_endpoint, headers=headers, json=payload)
if response.status_code != 200:
raise ValueError(f"Error calling custom LLM API: {response.text}")
return response.json()["text"]
# 开发自定义提示模板
class CustomPromptTemplate(BasePromptTemplate):
"""Custom prompt template."""
template: str
input_variables: List[str]
def format(self, **kwargs: Any) -> str:
"""Format the prompt with the given parameters."""
# 自定义格式化逻辑
return self.template.format(**kwargs)
def format_prompt(self, **kwargs: Any) -> PromptValue:
"""Format the prompt with the given parameters and return a PromptValue."""
text = self.format(**kwargs)
return StringPromptValue(text=text)
@property
def _get_prompt_dict(self) -> Dict:
"""Return a dictionary of the prompt."""
return {
"input_variables": self.input_variables,
"template": self.template
}
# 开发自定义输出解析器
class CustomOutputParser(BaseOutputParser):
"""Custom output parser."""
def parse(self, text: str) -> Any:
"""Parse the output text and return a structured result."""
# 自定义解析逻辑
return text.split("\n")
def get_format_instructions(self) -> str:
"""Get format instructions for the output."""
return "Your answer should be a list of lines."
10.3 插件系统
LangChain的插件系统允许开发者扩展其功能,而不需要修改核心代码。插件可以是自定义工具、回调处理器、内存实现等:
# 开发自定义工具插件
class CustomTool(BaseTool):
"""Custom tool for performing a specific task."""
name = "custom_tool"
description = "Useful for performing a specific task"
def _run(self, input: str) -> str:
"""Run the tool."""
# 自定义工具逻辑
return f"Processed: {input}"
async def _arun(self, input: str) -> str:
"""Run the tool asynchronously."""
# 自定义异步工具逻辑
return f"Processed: {input}"
# 开发自定义回调处理器插件
class CustomCallbackHandler(BaseCallbackHandler):
"""Custom callback handler for tracking LLM usage."""
def on_llm_start(
self, serialized: Dict[str, Any], prompts: List[str], **kwargs: Any
) -> None:
"""Log the start of LLM execution."""
print(f"LLM started with prompts: {prompts}")
def on_llm_new_token(self, token: str, **kwargs: Any) -> None:
"""Log each new token generated by the LLM."""
print(f"New token: {token}")
def on_llm_end(self, response: LLMResult, **kwargs: Any) -> None:
"""Log the end of LLM execution."""
print(f"LLM finished with response: {response}")
def on_llm_error(
self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any
) -> None:
"""Log any errors that occur during LLM execution."""
print(f"LLM error: {error}")
# 使用插件
llm = OpenAI(callbacks=[CustomCallbackHandler()])
tools = [CustomTool()]
agent = initialize_agent(
tools,
llm,
agent="zero-shot-react-description",
verbose=True
)
result = agent.run("Perform a custom task")
print(result)
十一、错误处理与调试
11.1 常见错误类型
在使用LangChain的过程中,可能会遇到各种错误。以下是一些常见的错误类型:
-
API连接错误:与外部API(如OpenAI、Hugging Face等)的连接失败。
-
验证错误:输入参数不符合要求,如缺少必要的API密钥。
-
解析错误:输出解析器无法正确解析模型的输出。
-
内存错误:内存组件无法正确保存或加载上下文。
-
代理错误:代理系统无法正确选择或使用工具。
11.2 错误处理机制
LangChain提供了完善的错误处理机制,主要通过以下方式实现:
-
异常类:LangChain定义了一系列异常类,用于表示不同类型的错误。
-
错误捕获:在关键位置捕获可能的异常,并提供有意义的错误信息。
-
重试机制:对于临时性的错误(如网络波动),提供重试机制。
以下是LangChain中异常类的定义:
class LangChainException(Exception):
"""Base exception class for LangChain errors."""
def __init__(self, message: str, *args: Any, **kwargs: Any):
"""Initialize the exception."""
super().__init__(message, *args, **kwargs)
self.message = message
class APIConnectionError(LangChainException):
"""Exception for errors related to API connections."""
def __init__(self, message: str, status_code: Optional[int] = None, *args: Any, **kwargs: Any):
"""Initialize the API connection error."""
super().__init__(message, *args, **kwargs)
self.status_code = status_code
class ValidationError(LangChainException):
"""Exception for validation errors."""
def __init__(self, message: str, errors: Optional[List[Any]] = None, *args: Any, **kwargs: Any):
"""Initialize the validation error."""
super().__init__(message, *args, **kwargs)
self.errors = errors
class ParserError(LangChainException):
"""Exception for errors related to output parsing."""
def __init__(self, message: str, output: Optional[str] = None, *args: Any, **kwargs: Any):
"""Initialize the parser error."""
super().__init__(message, *args, **kwargs)
self.output = output
class MemoryError(LangChainException):
"""Exception for errors related to memory."""
def __init__(self, message: str, *args: Any, **kwargs: Any):
"""Initialize the memory error."""
super().__init__(message, *args, **kwargs)
class AgentError(LangChainException):
"""Exception for errors related to agents."""
def __init__(self, message: str, action: Optional[AgentAction] = None, *args: Any, **kwargs: Any):
"""Initialize the agent error."""
super().__init__(message, *args, **kwargs)
self.action = action
11.3 调试工具与技巧
在开发和调试使用LangChain的应用程序时,可以使用以下工具和技巧:
-
日志记录:启用详细的日志记录,查看API调用、中间结果等信息。
-
回调处理器:使用自定义回调处理器跟踪链的执行过程。
-
调试模式:在开发环境中启用调试模式,获取更详细的错误信息。
-
单元测试:编写单元测试,确保各个组件正常工作。
以下是一个使用自定义回调处理器进行调试的示例:
class DebugCallbackHandler(BaseCallbackHandler):
"""Callback handler for debugging LangChain applications."""
def on_llm_start(
self, serialized: Dict[str, Any], prompts: List[str], **kwargs: Any
) -> None:
"""Log the start of LLM execution."""
print(f"\n\n=== LLM Start ===")
print(f"Model: {serialized.get('name')}")
print(f"Prompts: {prompts}")
def on_llm_new_token(self, token: str, **kwargs: Any) -> None:
"""Log each new token generated by the LLM."""
print(f"Token: {token}")
def on_llm_end(self, response: LLMResult, **kwargs: Any) -> None:
"""Log the end of LLM execution."""
print(f"=== LLM End ===")
print(f"Response: {response}")
def on_chain_start(
self, serialized: Dict[str, Any], inputs: Dict[str, Any], **kwargs: Any
) -> None:
"""Log the start of chain execution."""
print(f"\n\n=== Chain Start ===")
print(f"Chain: {serialized.get('name')}")
print(f"Inputs: {inputs}")
def on_chain_end(self, outputs: Dict[str, Any], **kwargs: Any) -> None:
"""Log the end of chain execution."""
print(f"=== Chain End ===")
print(f"Outputs: {outputs}")
def on_tool_start(
self, serialized: Dict[str, Any], input_str: str, **kwargs: Any
) -> None:
"""Log the start of tool execution."""
print(f"\n\n=== Tool Start ===")
print(f"Tool: {serialized.get('name')}")
print(f"Input: {input_str}")
def on_tool_end(self, output: str, **kwargs: Any) -> None:
"""Log the end of tool execution."""
print(f"=== Tool End ===")
print(f"Output: {output}")
# 使用调试回调处理器
llm = OpenAI(callbacks=[DebugCallbackHandler()])
tools = [
Tool(
name="Search",
func=SerpAPIWrapper().run,
description="Useful for finding information on the internet"
)
]
agent = initialize_agent(
tools,
llm,
agent="zero-shot-react-description",
verbose=True
)
result = agent.run("What is the capital of France?")
print(result)
十二、性能优化与最佳实践
12.1 性能优化策略
在使用LangChain开发应用程序时,可以采用以下性能优化策略:
-
批量处理:尽量批量处理请求,减少API调用次数。
-
缓存机制:使用缓存存储经常使用的结果,避免重复计算。
-
异步处理:对于需要处理大量并发请求的场景,使用异步API。
-
模型选择:根据具体任务选择合适的模型,避免使用过大的模型。
-
流式响应:对于长文本生成,使用流式响应提高用户体验。
12.2 缓存机制实现
LangChain提供了缓存机制,可以缓存LLM的响应,避免重复计算:
# 启用简单的内存缓存
from langchain.cache import InMemoryCache
import langchain
langchain.llm_cache = InMemoryCache()
# 使用缓存的LLM
llm = OpenAI()
# 第一次调用会执行实际的API请求
response1 = llm("What is the capital of France?")
# 第二次调用会从缓存中获取结果,不会执行实际的API请求
response2 = llm("What is the capital of France?")
# 使用Redis缓存
from langchain.cache import RedisCache
import redis
# 连接到Redis服务器
r = redis.Redis(host="localhost", port=6379, db=0)
# 配置Redis缓存
langchain.llm_cache = RedisCache(redis_client=r)
# 使用Redis缓存的LLM
llm = OpenAI()
# 第一次调用会执行实际的API请求
response1 = llm("What is the capital of France?")
# 第二次调用会从Redis缓存中获取结果
response2 = llm("What is the capital of France?")
12.3 最佳实践
在使用LangChain开发应用程序时,建议遵循以下最佳实践:
-
模块化设计:将应用程序分解为多个小的、可复用的组件。
-
配置管理:使用配置文件或环境变量管理API密钥、模型参数等配置信息。
-
错误处理:实现完善的错误处理机制,处理各种可能的异常情况。
-
测试覆盖:编写单元测试和集成测试,确保代码质量。
-
文档编写:为自定义组件和应用程序编写清晰的文档,便于维护和协作。
以下是一个遵循最佳实践的LangChain应用程序示例:
# config.py - 配置管理
import os
OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY")
SERPAPI_API_KEY = os.environ.get("SERPAPI_API_KEY")
REDIS_HOST = os.environ.get("REDIS_HOST", "localhost")
REDIS_PORT = int(os.environ.get("REDIS_PORT", 6379))
# utils.py - 工具函数
from langchain.cache import RedisCache
import redis
def get_llm_cache():
"""获取LLM缓存实例"""
r = redis.Redis(host=REDIS_HOST, port=REDIS_PORT, db=0)
return RedisCache(redis_client=r)
# agents.py - 代理定义
from langchain.agents import initialize_agent, Tool
from langchain.llms import OpenAI
from langchain.utilities import SerpAPIWrapper
from langchain.chains import RetrievalQA
from langchain.vectorstores import FAISS
from langchain.embeddings import OpenAIEmbeddings
def create_search_agent():
"""创建搜索代理"""
# 配置LLM,启用缓存
llm = OpenAI(
openai_api_key=OPENAI_API_KEY,
cache=get_llm_cache()
)
# 配置工具
search = SerpAPIWrapper(serpapi_api_key=SERPAPI_API_KEY)
tools = [
Tool(
name="Search",
func=search.run,
description="Useful for finding information on the internet"
)
]
# 初始化代理
agent = initialize_agent(
tools,
llm,
agent="zero-shot-react-description",
verbose=True
)
return agent
def create_qa_agent(documents):
"""创建问答代理"""
# 配置LLM,启用缓存
llm = OpenAI(
openai_api_key=OPENAI_API_KEY,
cache=get_llm_cache()
)
# 创建向量数据库
embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY)
vectorstore = FAISS.from_texts([doc.page_content for doc in documents], embeddings)
# 创建检索链
retriever = vectorstore.as_retriever()
qa_chain = RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff",
retriever=retriever
)
# 配置工具
tools = [
Tool(
name="QA",
func=qa_chain.run,
description="Useful for answering questions based on documents"
)
]
# 初始化代理
agent = initialize_agent(
tools,
llm,
agent="zero-shot-react-description",
verbose=True
)
return agent
# app.py - 应用程序入口
from agents import create_search_agent, create_qa_agent
from langchain.document_loaders import TextLoader
from langchain.text_splitter import CharacterTextSplitter
def main():
try:
# 加载文档
loader = TextLoader("documents/knowledge.txt")
documents = loader.load()
# 分割文档
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)
# 创建代理
search_agent = create_search_agent()
qa_agent = create_qa_agent(texts)
# 使用代理回答问题
question = "What is the capital of France?"
print(f"Question: {question}")
print("\nSearch Agent Answer:")
search_answer = search_agent.run(question)
print(search_answer)
print("\nQA Agent Answer:")
qa_answer = qa_agent.run(question)
print(qa_answer)
except Exception as e:
print(f"Error: {e}")
if __name__ == "__main__":
main()
十三、安全与隐私考虑
13.1 数据安全
在使用LangChain开发应用程序时,需要特别注意数据安全:
-
API密钥管理:妥善管理API密钥,避免将其硬编码在代码中或上传到公共仓库。
-
数据传输安全:确保与外部API的通信使用加密连接(如HTTPS)。
-
数据存储安全:对于敏感数据,确保在存储时进行加密。
-
访问控制:限制对API密钥和敏感数据的访问权限,仅授权必要的人员访问。
13.2 隐私保护
在处理用户数据时,需要遵循隐私保护原则:
-
最小数据收集:仅收集必要的用户数据,避免收集过多的个人信息。
-
数据匿名化:在可能的情况下,对用户数据进行匿名化处理。
-
用户同意:在收集和处理用户数据之前,获得用户的明确同意。
-
合规性:确保应用程序符合相关的隐私法规(如GDPR、CCPA等)。
13.3 安全最佳实践
以下是一些安全最佳实践,可以帮助保护LangChain应用程序:
- 环境变量:使用环境