使用Comet管理和优化LangChain实验：从LLM会话到自定义评估指标引言在当今快速发展的机器学习领域，敏捷决策

引言

在当今快速发展的机器学习领域，敏捷决策和实验管理变得至关重要。Comet是一个强大的机器学习平台，它可以帮助我们将实验、评估指标和LLM（大语言模型）会话集成到现有的基础设施和工具中。在本文中，我将演示如何使用Comet来跟踪LangChain实验，并介绍如何通过API代理服务提高访问稳定性。

主要内容

安装Comet和相关依赖

首先，我们需要安装所需的库以支持LangChain和Comet的集成。

%pip install --upgrade --quiet comet_ml langchain langchain_openai google-search-results spacy textstat pandas

接着，我们需要下载英语语言模型以便于自然语言处理：

!{sys.executable} -m spacy download en_core_web_sm

初始化Comet并设置凭据

获取您的Comet API Key，并初始化Comet环境：

import comet_ml
comet_ml.init(project_name="comet-example-langchain")

同时，我们需要设置OpenAI和SerpAPI的API Key：

import os
os.environ["OPENAI_API_KEY"] = "your_openai_api_key"  # 使用API代理服务提高访问稳定性
os.environ["SERPAPI_API_KEY"] = "your_serpapi_api_key"  # 使用API代理服务提高访问稳定性

场景1：仅使用LLM

在这一部分，我们将展示如何使用CometCallbackHandler来跟踪一个简单的LLM任务。

from langchain_community.callbacks import CometCallbackHandler
from langchain_core.callbacks import StdOutCallbackHandler
from langchain_openai import OpenAI

comet_callback = CometCallbackHandler(
    project_name="comet-example-langchain",
    complexity_metrics=True,
    stream_logs=True,
    tags=["llm"],
    visualizations=["dep"],
)
callbacks = [StdOutCallbackHandler(), comet_callback]
llm = OpenAI(temperature=0.9, callbacks=callbacks, verbose=True)

llm_result = llm.generate(["Tell me a joke", "Tell me a poem", "Tell me a fact"] * 3)
print("LLM result", llm_result)
comet_callback.flush_tracker(llm, finish=True)

场景2：在链中使用LLM

from langchain.chains import LLMChain
from langchain_community.callbacks import CometCallbackHandler
from langchain_core.callbacks import StdOutCallbackHandler
from langchain_core.prompts import PromptTemplate
from langchain_openai import OpenAI

comet_callback = CometCallbackHandler(
    complexity_metrics=True,
    project_name="comet-example-langchain",
    stream_logs=True,
    tags=["synopsis-chain"],
)
callbacks = [StdOutCallbackHandler(), comet_callback]
llm = OpenAI(temperature=0.9, callbacks=callbacks)

template = """You are a playwright. Given the title of play, it is your job to write a synopsis for that title.
Title: {title}
Playwright: This is a synopsis for the above play:"""
prompt_template = PromptTemplate(input_variables=["title"], template=template)
synopsis_chain = LLMChain(llm=llm, prompt=prompt_template, callbacks=callbacks)

test_prompts = [{"title": "Documentary about Bigfoot in Paris"}]
print(synopsis_chain.apply(test_prompts))
comet_callback.flush_tracker(synopsis_chain, finish=True)

场景3：使用具有工具的代理

这一部分展示了如何使用代理来完成复杂的任务，例如计算和检索信息。

场景4：使用自定义评估指标

使用ROUGE指标评估生成摘要的质量是一个示例。

%pip install --upgrade --quiet rouge-score

from langchain.chains import LLMChain
from langchain_community.callbacks import CometCallbackHandler
from langchain_core.callbacks import StdOutCallbackHandler
from langchain_core.prompts import PromptTemplate
from langchain_openai import OpenAI
from rouge_score import rouge_scorer

class Rouge:
    def __init__(self, reference):
        self.reference = reference
        self.scorer = rouge_scorer.RougeScorer(["rougeLsum"], use_stemmer=True)

    def compute_metric(self, generation, prompt_idx, gen_idx):
        prediction = generation.text
        results = self.scorer.score(target=self.reference, prediction=prediction)

        return {
            "rougeLsum_score": results["rougeLsum"].fmeasure,
            "reference": self.reference,
        }

reference = """
The tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building.
It was the first structure to reach a height of 300 metres.

It is now taller than the Chrysler Building in New York City by 5.2 metres (17 ft)
Excluding transmitters, the Eiffel Tower is the second tallest free-standing structure in France.
"""
rouge_score = Rouge(reference=reference)

template = """Given the following article, it is your job to write a summary.
Article:
{article}
Summary: This is the summary for the above article:"""
prompt_template = PromptTemplate(input_variables=["article"], template=template)

comet_callback = CometCallbackHandler(
    project_name="comet-example-langchain",
    complexity_metrics=False,
    stream_logs=True,
    tags=["custom_metrics"],
    custom_metrics=rouge_score.compute_metric,
)
callbacks = [StdOutCallbackHandler(), comet_callback]
llm = OpenAI(temperature=0.9)

synopsis_chain = LLMChain(llm=llm, prompt=prompt_template)

test_prompts = [
    {
        "article": """
                 The tower is 324 metres (1,063 ft) tall, about the same height as
                 an 81-storey building, and the tallest structure in Paris. Its base is square,
                 measuring 125 metres (410 ft) on each side.
                 During its construction, the Eiffel Tower surpassed the
                 Washington Monument to become the tallest man-made structure in the world,
                 a title it held for 41 years until the Chrysler Building
                 in New York City was finished in 1930.

                 It was the first structure to reach a height of 300 metres.
                 Due to the addition of a broadcasting aerial at the top of the tower in 1957,
                 it is now taller than the Chrysler Building by 5.2 metres (17 ft).

                 Excluding transmitters, the Eiffel Tower is the second tallest
                 free-standing structure in France after the Millau Viaduct.
                 """
    }
]
print(synopsis_chain.apply(test_prompts, callbacks=callbacks))
comet_callback.flush_tracker(synopsis_chain, finish=True)

常见问题和解决方案

API访问问题：由于地域网络限制，可能需要使用API代理服务以保证稳定的访问。
模型性能问题：调试过程中，适当调整模型参数和超参数。
集成挑战：确保依赖库版本兼容及环境配置正确。

总结和进一步学习资源

通过Comet与LangChain的集成，您可以有效地管理和优化机器学习实验。在本文中，我们探讨了如何使用Comet跟踪LLM和链的执行，并定义和使用自定义评估指标。

进一步学习资源

参考资料

官方Comet API参考
官方LangChain文档

如果这篇文章对你有帮助，欢迎点赞并关注我的博客。您的支持是我持续创作的动力！

---END---