Windows,Linux下读取Docx,Doc 文档

1,050 阅读1分钟

本文已参与「新人创作礼」活动,一起开启掘金创作之路。

Windows

Doxc

下载python-docx模块库

pip install python-docx
import docx

word = "a.docx"
document = docx.Document(word)
for paragraph in document.paragraphs:
    text = paragraph.text
print(text)

Doc

下载win32com模块库,只支持Windows下

python -m pip install pypiwin32
from win32com import client
import pythoncom

word = "a.docx"
pythoncom.CoInitialize()
word = client.Dispatch('Word.Application')
word.Visible = 0  # 后台运行,不显示
word.DisplayAlerts = 0  # 不警告
doc = word.Documents.Open(word)
for para in doc.paragraphs:
    print(para.Range.Text)
doc.SaveAs('D:PythonFiles/4paradigm/gdt_flask/file/test.txt', 2)
doc.Close()
word.Quit()
pythoncom.CoUninitialize()

Linux

Doxc

下载python-docx模块库

pip install python-docx
import docx

word = "a.docx"
document = docx.Document(word)
for paragraph in document.paragraphs:
    text = paragraph.text
print(text)

Doc

安装 antiword 下载地址:www.winfield.demon.nl/linux/antiw…

解压进入目录
tar -zxvf antiword-0.37.tar.gz

cd  antiword-0.37

make && make install

安装时,自动安装到了/root/目录下,只有root才可执行该命令,我们需要改一下路径,COPY到/usr中方便调用。

cp /root/bin/*antiword /usr/local/bin/
mkdir /usr/share/antiword
cp -R /root/.antiword/* /usr/share/antiword/
chmod 777 /usr/local/bin/*antiword
chmod 755 /usr/share/antiword/*
"""
    代码用法
"""
word = "a.doc"
output = subprocess.check_output(["antiword", word])
# 解码
output = output.decode('utf8')
print(output)

Fighter_ma: 弱小和无知不是生存的障碍,傲慢才是~