Scrapy基础（一）：安装和使用

2018-12-12 194 阅读1分钟

安装

pip install -i http://pypi.douban.com/simple scrapy    
// -i http://pypi.douban.com/simple 为加速安装

新建scrapy项目

scrapy startproject ArticleSpider  //会在当前路径创建项目 ArticleSpider为项目名
cd ArticleSpider && genspider example example.com //创建爬虫模板 example为spider名称 example.com为网站域名

目录

scrapy.cfg //项目配置
ArticleSpider/settings.py  //工程配置
ArticleSpider/pipelines.py //数据存储
ArticleSpider/middlewares.py 存放自定制的middlewares
ArticleSpider/items  //保存格式
spilers  //具体的爬虫

scrapy模板

import scrapy

class XXX(scrapy.Spider):
    name = 'xxx'  //名字
    allowed_domains = ['example.com']  //域名
    start_urls = ['http://example.com']  //起始url

    def parse(self, response):  //具体的爬虫逻辑
        pass

使用pycharm调试scrapy执行流程

--- main.py ---
from scrapy.cmdline import execute
import sys
import os

# os.path.abspath(__file__))  获取当前文件的绝对路径
# os.path.dirname()  获取当前文件的父目录
sys.path.append(os.path.dirname(os.path.abspath(__file__)))  
# execute 执行终端命令
execute(["scrapy","crawl","xxx"])

scrapy 终端调试

scrapy shell url
//然后回进入终端，使用response参数获取爬取的内容如：
response.xpath()

xpath 使用

节点关系

语法1

语法2-谓语

语法3

css选择器

css选择器1

css选择器2

css选择器3