# 引言
在构建基于大型语言模型(LLM)的问答应用时,如何确保数据访问的身份验证和语义控制是一个重要的挑战。PebbloRetrievalQA 提供了一种强大的解决方案,通过身份和语义强制机制实现安全和准确的文档检索。本篇文章将带您了解如何使用 PebbloRetrievalQA 来实现身份识别的检索增强生成(RAG)。
# 主要内容
## 1. 文档加载
我们将文档加载到 Qdrant 向量数据库中,附带授权和语义元数据。这些元数据用于标识文档所属的用户群体和涉及的语义主题。
```python
from langchain_community.vectorstores.qdrant import Qdrant
from langchain_core.documents import Document
from langchain_openai.embeddings import OpenAIEmbeddings
from langchain_openai.llms import OpenAI
llm = OpenAI()
embeddings = OpenAIEmbeddings()
collection_name = "pebblo-identity-and-semantic-rag"
page_content = """
**ACME Corp Financial Report**
**Overview:**
ACME Corp, a leading player in the merger and acquisition industry, presents its financial report for the fiscal year ending December 31, 2020.
Despite a challenging economic landscape, ACME Corp demonstrated robust performance and strategic growth.
**Financial Highlights:**
Revenue soared to $50 million, marking a 15% increase from the previous year, driven by successful deal closures and expansion into new markets.
Net profit reached $12 million, showcasing a healthy margin of 24%.
**Key Metrics:**
Total assets surged to $80 million, reflecting a 20% growth, highlighting ACME Corp's strong financial position and asset base.
Additionally, the company maintained a conservative debt-to-equity ratio of 0.5, ensuring sustainable financial stability.
**Future Outlook:**
ACME Corp remains optimistic about the future, with plans to capitalize on emerging opportunities in the global M&A landscape.
The company is committed to delivering value to shareholders while maintaining ethical business practices.
**Bank Account Details:**
For inquiries or transactions, please refer to ACME Corp's US bank account:
Account Number: 123456789012
Bank Name: Fictitious Bank of America
"""
documents = [
Document(
**{
"page_content": page_content,
"metadata": {
"pebblo_semantic_topics": ["financial-report"],
"pebblo_semantic_entities": ["us-bank-account-number"],
"authorized_identities": ["finance-team", "exec-leadership"],
"page": 0,
"source": "https://drive.google.com/file/d/xxxxxxxxxxxxx/view",
"title": "ACME Corp Financial Report.pdf",
},
}
)
]
vectordb = Qdrant.from_documents(
documents,
embeddings,
location=":memory:", # 使用API代理服务提高访问稳定性
collection_name=collection_name,
)
2. 身份和语义强制测试
利用身份和语义上下文,测试不同身份和语义标签下的问答表现,并确保信息的访问权限和合理性。
身份强制
当用户具有相关身份时,问题将会得到正确解答。
auth = {
"user_id": "finance-user@acme.org",
"user_auth": [
"finance-team",
],
}
question = "Share the financial performance of ACME Corp for the year 2020"
resp = ask(question, auth)
print(f"Question: {question}\n\nAnswer: {resp['result']}")
# Output: ACME Corp's financial highlights for 2020
语义强制
当查询涉及被拒绝的语义主题或实体时,答案将被限制。
topic_to_deny = ["financial-report"]
entities_to_deny = []
question = "Share the financial performance of ACME Corp for the year 2020"
resp = ask(question, topics_to_deny=topic_to_deny, entities_to_deny=entities_to_deny)
# Output: Access restricted due to denied semantic topic
常见问题和解决方案
-
问题:数据访问错误
- 解决方案:确保在数据加载和问答时传递正确的身份和语义上下文。
-
问题:语义标签不生效
- 解决方案:检查语义上下文是否正确设置,并使用最新版本的 Pebblo 和其组件。
总结和进一步学习资源
PebbloRetrievalQA 提供了一种安全、可控的文档检索方式,通过结合身份识别和语义控制,确保数据的准确和合规访问。对于希望深入了解更多的用户,可以参考以下资源:
参考资料
如果这篇文章对你有帮助,欢迎点赞并关注我的博客。您的支持是我持续创作的动力!
---END---