引言
Amazon SageMaker是一项完全托管的服务,用于快速构建、训练和部署机器学习模型。SageMaker Experiments是其中的一个功能,可以帮助我们组织、追踪、比较和评估机器学习实验及其模型版本。在本文中,我们将探讨如何利用LangChain Callback将提示和其他大语言模型(LLM)的超参数记录到SageMaker Experiments中。
主要内容
安装和设置
首先,我们需要安装相关的Python包:
%pip install --upgrade --quiet sagemaker
%pip install --upgrade --quiet langchain-openai
%pip install --upgrade --quiet google-search-results
接着,设置所需的API密钥:
import os
# 添加您的API密钥
os.environ["OPENAI_API_KEY"] = "<ADD-KEY-HERE>" # 使用OpenAI的API密钥
os.environ["SERPAPI_API_KEY"] = "<ADD-KEY-HERE>" # 使用Google SERP API密钥
追踪实验场景
我们将展示三个不同的场景,以演示如何使用SageMaker Experiments追踪大语言模型的运行。
场景1:单一LLM
在这个场景中,我们使用单个LLM模型生成输出。
from langchain_community.callbacks.sagemaker_callback import SageMakerCallbackHandler
from langchain.chains import LLMChain
from langchain_core.prompts import PromptTemplate
from langchain_openai import OpenAI
from sagemaker.experiments.run import Run
from sagemaker.session import Session
# LLM超参数和实验设置
HPARAMS = {"temperature": 0.1, "model_name": "gpt-3.5-turbo-instruct"}
EXPERIMENT_NAME = "langchain-sagemaker-tracker"
session = Session()
with Run(experiment_name=EXPERIMENT_NAME, run_name="run-scenario-1", sagemaker_session=session) as run:
sagemaker_callback = SageMakerCallbackHandler(run)
llm = OpenAI(callbacks=[sagemaker_callback], **HPARAMS)
prompt = PromptTemplate.from_template(template="tell me a joke about {topic}")
chain = LLMChain(llm=llm, prompt=prompt, callbacks=[sagemaker_callback])
chain.run(topic="fish")
sagemaker_callback.flush_tracker()
场景2:顺序链
在此场景中,我们创建了一个顺序链,连接了两个LLM模型。
from langchain.chains import SimpleSequentialChain
with Run(experiment_name=EXPERIMENT_NAME, run_name="run-scenario-2", sagemaker_session=session) as run:
sagemaker_callback = SageMakerCallbackHandler(run)
llm = OpenAI(callbacks=[sagemaker_callback], **HPARAMS)
prompt_template1 = PromptTemplate.from_template(template="You are a playwright. Given the title of play, it is your job to write a synopsis for that title.\nTitle: {title}\nPlaywright: This is a synopsis for the above play:")
prompt_template2 = PromptTemplate.from_template(template="You are a play critic from the New York Times. Given the synopsis of play, it is your job to write a review for that play.\nPlay Synopsis: {synopsis}\nReview from a New York Times play critic of the above play:")
chain1 = LLMChain(llm=llm, prompt=prompt_template1, callbacks=[sagemaker_callback])
chain2 = LLMChain(llm=llm, prompt=prompt_template2, callbacks=[sagemaker_callback])
overall_chain = SimpleSequentialChain(chains=[chain1, chain2], callbacks=[sagemaker_callback])
overall_chain.run(input="documentary about good video games that push the boundary of game design")
sagemaker_callback.flush_tracker()
场景3:代理和工具
这个场景中我们使用多个工具(搜索和数学)以及一个LLM模型。
from langchain.agents import initialize_agent, load_tools
with Run(experiment_name=EXPERIMENT_NAME, run_name="run-scenario-3", sagemaker_session=session) as run:
sagemaker_callback = SageMakerCallbackHandler(run)
llm = OpenAI(callbacks=[sagemaker_callback], **HPARAMS)
tools = load_tools(["serpapi", "llm-math"], llm=llm, callbacks=[sagemaker_callback]) # 使用API代理服务提高访问稳定性
agent = initialize_agent(tools, llm, agent="zero-shot-react-description", callbacks=[sagemaker_callback])
agent.run(input="Who is the oldest person alive? And what is their current age raised to the power of 1.51?")
sagemaker_callback.flush_tracker()
加载日志数据
实验结束后,我们可以通过以下代码将记录的数据加载为Pandas DataFrame:
from sagemaker.analytics import ExperimentAnalytics
logs = ExperimentAnalytics(experiment_name=EXPERIMENT_NAME)
df = logs.dataframe(force_refresh=True)
print(df.shape)
df.head()
常见问题和解决方案
-
网络限制和API访问
- 由于某些地区的网络限制,开发者可能需要考虑使用API代理服务,以提高稳定性。
-
多LLM模型的管理
- 在复杂链式结构中,确保正确管理每个模型的状态和回调。
总结和进一步学习资源
通过本文的介绍,你应该对如何使用Amazon SageMaker进行大模型实验追踪有了更深入的理解。建议进一步阅读以下资源:
参考资料
- Amazon SageMaker Experiments: docs.aws.amazon.com/sagemaker/l…
- LangChain Documentation: python.langchain.com/docs/get_st…
如果这篇文章对你有帮助,欢迎点赞并关注我的博客。您的支持是我持续创作的动力!
---END---