[跨越边界：用Modal轻松运行LangChain自定义LLM]部署Web端点使用modal deploy命令将Web

# 跨越边界：用Modal轻松运行LangChain自定义LLM

## 引言

在当今人工智能的世界中，语言模型（LLM）是推动许多创新的核心。随着技术的不断进步，开发者需要简单、稳定的方法来部署和使用这些模型。本文将介绍如何使用Modal来运行LangChain自定义LLM，并通过web端点进行交互。

## 主要内容

### Modal安装和端点部署

1. **安装和设置**
   - 使用`pip install modal`安装Modal。
   - 运行`modal token new`以生成新的访问令牌。

2. **定义Modal函数和Webhooks**
   - 需要定义一个包含`prompt`的基本响应结构。
   - 创建一个主函数来处理LLM调用。

   ```python
   from pydantic import BaseModel
   import modal

   CACHE_PATH = "/root/model_cache"

   class Item(BaseModel):
       prompt: str

   stub = modal.Stub(name="example-get-started-with-langchain")

   def download_model():
       from transformers import GPT2Tokenizer, GPT2LMHeadModel
       tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
       model = GPT2LMHeadModel.from_pretrained('gpt2')
       tokenizer.save_pretrained(CACHE_PATH)
       model.save_pretrained(CACHE_PATH)

   image = modal.Image.debian_slim().pip_install(
       "tokenizers", "transformers", "torch", "accelerate"
   ).run_function(download_model)

   @stub.function(
       gpu="any",
       image=image,
       retries=3,
   )
   def run_gpt2(text: str):
       from transformers import GPT2Tokenizer, GPT2LMHeadModel
       tokenizer = GPT2Tokenizer.from_pretrained(CACHE_PATH)
       model = GPT2LMHeadModel.from_pretrained(CACHE_PATH)
       encoded_input = tokenizer(text, return_tensors='pt').input_ids
       output = model.generate(encoded_input, max_length=50, do_sample=True)
       return tokenizer.decode(output[0], skip_special_tokens=True)

   @stub.function()
   @modal.web_endpoint(method="POST")
   def get_text(item: Item):
       return {"prompt": run_gpt2.call(item.prompt)}

部署Web端点
- 使用modal deploy命令将Web端点部署到Modal云。获取一个持久化的URL。

使用部署的Web端点与LLM包装类

通过使用Modal类可以轻松访问已部署的端点，如下所示：

from langchain_community.llms import Modal

endpoint_url = "https://ecorp--custom-llm-endpoint.modal.run"  # 替换为您部署的URL

llm = Modal(endpoint_url=endpoint_url)
llm_chain = LLMChain(prompt=prompt, llm=llm)

question = "What NFL team won the Super Bowl in the year Justin Beiber was born?"
llm_chain.run(question)

常见问题和解决方案

网络访问问题：由于某些地区的网络限制，开发者可能需要考虑使用API代理服务来提高访问稳定性。
模型加载慢：确保在合适的条件下缓存模型以减少加载时间。

总结和进一步学习资源

通过本文，您应该能够使用Modal轻松部署和交互自定义的LangChain LLM。进一步学习资源包括：

参考资料

如果这篇文章对你有帮助，欢迎点赞并关注我的博客。您的支持是我持续创作的动力！

---END---