老婆大人一早发了两份pdf,需要将pdf的文件内容转成图片,上传到某网站,手头的PDF软件都提示要登录,为了这么一个小功能要注册一个网站,有点不值当,于是准备自己搞一个.
准备工作
python是第一利器,有太多的开源包可用,网上找了一下,准备用PyMuPDF包,简单实现下这个功能.
pip install PyMuPDF
PyMuPDF有个好处,依赖包比较少,安装也比较顺利,安装完测试一下
import fitz
print(fitz.__doc__)
fitz就是PyMuPDF,网上有关于这个包的由来,就不多说了,目前我使用的版本情况如下
小试一把
知乎上有对这个包的详细介绍(Python处理PDF神器:PyMuPDF的安装与使用 - 知乎 (zhihu.com)),由于我只想将PDF转成图片,匆匆试一把
import fitz
import os
def pdf2png(pdf_path, image_path):
pdf_doc = fitz.open(pdf_path)
for page_counts in range(pdf_doc.page_count):
page = pdf_doc[page_counts]
rotate = int(0)
zoom_x = 2
zoom_y = 2
mat = fitz.Matrix(zoom_x, zoom_y).prerotate(rotate)
pix = page.get_pixmap(matrix=mat, alpha=False)
if not os.path.exists(image_path):
os.mkdir(image_path)
pix.save(image_path+"/"+"image_%s.png"%page_counts)
pdf_path = r"C:\test.pdf"
image_path = r"C:\image"
pdf2png(pdf_path, image_path)
简单试了一下,速度挺快,效果还不错,其中zoom_x和zoom_y可以用于调整图片像素大小,测试的PDF在2倍的参数下,效果不错.
封装成小应用
简单的给这串代码封装个界面吧.用QT给画个简单的界面,比如下面
uic转化成py
pyuic6.exe -o ui_pdf2png.py pdf2png.ui
转换按钮的slot调用pdf2png的方法,进度条算是一个无伤大雅的装饰吧,对美化这块不熟,调用系统自带的fusion风格,聊胜于无,整体代码如下
import os
import sys
import fitz
from PyQt6.QtCore import pyqtSlot
from PyQt6.QtWidgets import QDialog, QApplication, QFileDialog
from ui_pdf2png import Ui_pdf2img
class Pdf2Png(QDialog, Ui_pdf2img):
def __init__(self):
super(Pdf2Png, self).__init__()
self.cwd = os.getcwd()
self.setupUi(self)
@pyqtSlot()
def on_pushButton_pdf_scan_clicked(self):
filename, _ = QFileDialog.getOpenFileName(self, "选择PDF文件", self.cwd, "PDF文件(*.pdf)")
if filename != '':
self.lineEdit_pdf.setText(filename)
return
@pyqtSlot()
def on_pushButton_image_scan_clicked(self):
directory = QFileDialog.getExistingDirectory(self, "选择文件夹", self.cwd)
if directory != '':
self.lineEdit_image.setText(directory)
@pyqtSlot()
def on_pushButton_translate_clicked(self):
pdf_path = self.lineEdit_pdf.text().strip()
image_path = self.lineEdit_image.text().strip()
self.pdf2png(pdf_path, image_path)
self.progressBar.setValue(100)
self.label_result.setText(f"{os.path.basename(pdf_path)} 转换图片成功")
def pdf2png(self, pdf_path, image_path):
pdf_doc = fitz.open(pdf_path)
pdf_name = os.path.basename(pdf_path).split(".")[0]
pdf_pages = pdf_doc.page_count
for page_counts in range(pdf_pages):
page = pdf_doc[page_counts]
rotate = 0
zoom_x = 2
zoom_y = 2
mat = fitz.Matrix(zoom_x, zoom_y).prerotate(rotate)
pix = page.get_pixmap(matrix=mat, alpha=False)
if not os.path.exists(image_path):
os.mkdir(image_path)
pix.save(image_path + "/" + f"{pdf_name}_{page_counts}.png")
self.progressBar.setValue(int((page_counts / pdf_pages) * 100))
return 0
def main():
app = QApplication(sys.argv)
app.setStyle("Fusion")
l = Pdf2Png()
l.show()
sys.exit(app.exec())
if __name__ == '__main__':
main()
需要注意的是,调用slot的时候需要明确指明pyqtSlot装饰器,否则会同一个操作会执行两次,比如调用文件对话框时,没有装饰器的时候会出现两次文件对话框,整体效果如下图
源码找时间再上传吧