win10系统scrapy工程打包和部署

154 阅读6分钟

安装scrapy相关库:

# 安装顺序参考如下:
zope.interface
pyopenssl
twisted
lxml
scrapy

01.png

查看scrapy安装情况:

(venv) PS G:\Python_pj\Scrapy_vevn_04> scrapy -h
Scrapy 2.8.0 - no active project

Usage:                                                                  
  scrapy <command> [options] [args]                                     
                                                                        
Available commands:                                                     
  bench         Run quick benchmark test                                
  fetch         Fetch a URL using the Scrapy downloader                 
  genspider     Generate new spider using pre-defined templates         
  runspider     Run a self-contained spider (without creating a project)
  settings      Get settings values                                     
  shell         Interactive scraping console                            
  startproject  Create new project
  version       Print Scrapy version
  view          Open URL in browser, as seen by Scrapy

  [ more ]      More commands available when run from project directory

Use "scrapy <command> -h" to see more info about a command

创建scrapy项目:

(venv) PS G:\Python_pj\Scrapy_vevn_04> scrapy startproject ADtest
New Scrapy project 'ADtest', using template directory 'G:\Python_pj\Scrapy_vevn_04\venv\Lib\site-packages\scrapy\templates\project', created in:    
    G:\Python_pj\Scrapy_vevn_04\ADtest
You can start your first spider with:
    cd ADtest
    scrapy genspider example example.com
(venv) PS G:\Python_pj\Scrapy_vevn_04>

02.png

创建scrapy爬虫:

  • 快速创建 scrapy genspider baidu baidu.com
  • 指定模板创建 scrapy genspider -t basic tencent tencent.com
(venv) PS G:\Python_pj\Scrapy_vevn_04\ADtest> scrapy genspider baidu baidu.com
Created spider 'baidu' using template 'basic' in module:
  ADtest.spiders.baidu
(venv) PS G:\Python_pj\Scrapy_vevn_04\ADtest> scrapy genspider -t basic tencent tencent.com             
Created spider 'tencent' using template 'basic' in module:
  ADtest.spiders.tencent
(venv) PS G:\Python_pj\Scrapy_vevn_04\ADtest> scrapy list
baidu
tencent
(venv) PS G:\Python_pj\Scrapy_vevn_04\ADtest> 

测试scrapy爬虫:

  • 执行爬虫 scrapy crawl baidu
2023-03-27 16:18:23 [scrapy.core.engine] INFO: Spider opened
2023-03-27 16:18:23 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2023-03-27 16:18:23 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2023-03-27 16:18:23 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://baidu.com/robots.txt> (referer: None)
2023-03-27 16:18:23 [scrapy.downloadermiddlewares.robotstxt] DEBUG: Forbidden by robots.txt: <GET http://baidu.com/>
2023-03-27 16:18:23 [scrapy.core.engine] INFO: Closing spider (finished)

03.png

安装scrapy-redis库:

  • pip安装数据库包 pip install scrapy-redis

scrapyd部署:

  • 服务器端部署环境 pip install scrapyd
  • 客户端部署工具 pip install scrapyd-client

启动服务(服务器端打开并保持开启):

  • dbs文件夹 服务器端开启服务出现(存放数据库)
(venv) PS G:\Python_pj\Scrapy_vevn_04> scrapyd
2023-03-27T16:36:06+0800 [-] Loading G:\Python_pj\Scrapy_vevn_04\venv\lib\site-packages\scrapyd\txapp.py...
2023-03-27T16:36:06+0800 [-] Basic authentication disabled as either `username` or `password` is unset
2023-03-27T16:36:06+0800 [-] Scrapyd web console available at http://127.0.0.1:6800/
2023-03-27T16:36:06+0800 [-] Loaded.
2023-03-27T16:36:06+0800 [twisted.application.app.AppLogger#info] twistd 22.10.0 (G:\Python_pj\Scrapy_vevn_04\venv\Scripts\python.exe 3.10.5) starti
ng up.
2023-03-27T16:36:06+0800 [twisted.application.app.AppLogger#info] reactor class: twisted.internet.selectreactor.SelectReactor.
2023-03-27T16:36:06+0800 [-] Site starting on 6800
2023-03-27T16:36:06+0800 [twisted.web.server.Site#info] Starting factory <twisted.web.server.Site object at 0x000001B5F3BE3970>
2023-03-27T16:36:06+0800 [Launcher] Scrapyd 1.4.1 started: max_proc=80, runner='scrapyd.runner'

04.png

客户端开始部署服务

  • 客户端是否正常 scrapyd-deploy -h
(venv) PS G:\Python_pj\Scrapy_vevn_04> scrapyd-deploy -h
usage: scrapyd-deploy [-h] [-p PROJECT] [-v VERSION] [-l] [-a] [-d] [-L TARGET] [--egg FILE] [--build-egg FILE] [--include-dependencies] [TARGET]
                                                                                                                                                 
Deploy Scrapy project to Scrapyd server                                                                                                          
                                                                                                                                                 
positional arguments:                                                                                                                            
  TARGET                                                                                                                                         
                                                                                                                                                 
options:                                                                                                                                         
  -h, --help            show this help message and exit                                                                                          
  -p PROJECT, --project PROJECT                                                                                                                  
                        the project name in the TARGET                                                                                           
  -v VERSION, --version VERSION                                                                                                                  
                        the version to deploy. Defaults to current timestamp                                                                     
  -l, --list-targets    list available targets                                                                                                   
  -a, --deploy-all-targets                                                                                                                       
                        deploy all targets                                                                                                       
  -d, --debug           debug mode (do not remove build dir)                                                                                     
  -L TARGET, --list-projects TARGET                                                                                                              
                        list available projects in the TARGET                                                                                    
  --egg FILE            use the given egg, instead of building it
  --build-egg FILE      only build the egg, don't deploy it
  --include-dependencies
                        include dependencies from requirements.txt in the egg
(venv) PS G:\Python_pj\Scrapy_vevn_04> 
  • 部署命令 scrapyd-deploy
# 配置文件没有打开url地址
(venv) PS G:\Python_pj\Scrapy_vevn_04> cd ADtest
(venv) PS G:\Python_pj\Scrapy_vevn_04\ADtest> scrapyd-deploy
Unknown target: default

05.png

  • 打包完成
(venv) PS G:\Python_pj\Scrapy_vevn_04\ADtest> scrapyd-deploy
Packing version 1679907159
Deploying to project "ADtest" in http://localhost:6800/addversion.json
Server response (200):
{"node_name": "PS2022ZYSKWXTZ", "status": "ok", "project": "ADtest", "version": "1679907159", "spiders": 2}

(venv) PS G:\Python_pj\Scrapy_vevn_04\ADtest> 

06.png

再次进行部署测试

  • project改名再次部署并查看
(venv) PS G:\Python_pj\Scrapy_vevn_04\ADtest> scrapyd-deploy
Packing version 1679907735
Deploying to project "ADtest_001" in http://localhost:6800/addversion.json
Server response (200):
{"node_name": "PS2022ZYSKWXTZ", "status": "ok", "project": "ADtest_001", "version": "1679907735", "spiders": 2}

(venv) PS G:\Python_pj\Scrapy_vevn_04\ADtest> 

07.png 08.png

补充说明

  • 客户端部署打包需要安装库 pip install pywin32
  • 客户端完成打包会出现新增文件源码文件夹不受影响
  • 部署打包完成后服务器端会更新dbs和eggs文件内容

09.png

服务器部署多个scrapyd

  • 新建多个文件夹,每个文件夹放入配置文件 注意是复制出来请不要改原始文件
  • 修改配置文件端口号 原始端口:6800
  • 按文件夹路径启动服务测试 scrapyd -d G:/Python_pj/Scrapy_vevn_03/mingsen

04.png

01.png

02.png

03.png

可视化管理工具Gerapy

1.中英文切换
2.通过配置文件进行配置
3.支持动态配置,服务启动后可以进行配置主机、服务、爬虫
4.界面干净友好
5.基于Django
6.在线编辑代码文件
7.命令行全部封装
8.启动方式特殊
    a.gerapy init
    b.gerapy migrate
    c.gerapy runserver
  • Gerapy安装 pip install gerapy
  • 使用前提是scrapyd可以正常使用
(venv) PS G:\Python_pj\Scrapy_vevn_03> scrapyd
2023-03-28T11:51:55+0800 [-] Loading G:\Python_pj\Scrapy_vevn_03\venv\lib\site-packages\scrapyd\txapp.py...
2023-03-28T11:51:56+0800 [-] Basic authentication disabled as either `username` or `password` is unset
2023-03-28T11:51:56+0800 [-] Scrapyd web console available at http://127.0.0.1:6800/
2023-03-28T11:51:56+0800 [-] Loaded.
2023-03-28T11:51:56+0800 [twisted.application.app.AppLogger#info] twistd 22.10.0 (G:\Python_pj\Scrapy_vevn_03\venv\Scripts\python.exe 3.10.5) starti
ng up.
2023-03-28T11:51:56+0800 [twisted.application.app.AppLogger#info] reactor class: twisted.internet.selectreactor.SelectReactor.
2023-03-28T11:51:56+0800 [-] Site starting on 6800
2023-03-28T11:51:56+0800 [twisted.web.server.Site#info] Starting factory <twisted.web.server.Site object at 0x000002084DD4F100>
2023-03-28T11:51:56+0800 [Launcher] Scrapyd 1.4.1 started: max_proc=80, runner='scrapyd.runner'

初始化 gerapy init

(venv) PS G:\Python_pj\Scrapy_vevn_03> gerapy init
Initialized workspace gerapy
(venv) PS G:\Python_pj\Scrapy_vevn_03> 

01.png

迁移 gerapy migrate

# 进入路径gerapy迁移,把所有模型转变为数据库文件
(venv) PS G:\Python_pj\Scrapy_vevn_03> cd .\gerapy\
(venv) PS G:\Python_pj\Scrapy_vevn_03\gerapy> gerapy migrate

02.png

启动服务 gerapy runserver

(venv) PS G:\Python_pj\Scrapy_vevn_03\gerapy> gerapy runserver
Watching for file changes with StatReloader
Performing system checks...

INFO - 2023-03-28 12:01:46,329 - process: 14800 - scheduler.py - gerapy.server.core.scheduler - 105 - scheduler - successfully synced task with jobs
 with force
System check identified no issues (0 silenced).
March 28, 2023 - 12:01:46
Django version 2.2.28, using settings 'gerapy.server.server.settings'
Starting development server at http://127.0.0.1:8000/
Quit the server with CTRL-BREAK.

# 停止服务:命令行中Ctrl+C

03.png

创建用户密码 gerapy createsuperuser

(venv) PS G:\Python_pj\Scrapy_vevn_03\gerapy> gerapy createsuperuser
Username (leave blank to use 'admin'): admin
Email address: 
Password: 
Password (again):
This password is too short. It must contain at least 8 characters.
This password is too common.
This password is entirely numeric.
Bypass password validation and create user anyway? [y/N]: y
Superuser created successfully.

可视化界面

04.png