Scrapy请求类型中间件

44 阅读1分钟

在使用scrapy的过程中,难免有一些请求是普通方式解决不了的。比如最近用到了cloudscraper这个第三方包,破解免费版五秒盾。除此之外,还要给cloudscraper添加代理ip,否则也是治标不治本。(应该是和requests的用法一样)

cloudscraper使用代理IP的方式

import cloudscraper
scraper = cloudscraper.create_scraper()
proxies = {"http": entry, "https": entry}
content = scraper.get(request.url, proxies=proxies).text

Scrapy中间件的请求改写方法

class ProxyMiddleware(object):
    def process_request(self, request, spider):
        ...

        request.meta['proxy'] = entry

        if request.meta['middleware'] == 'cloudscraper':
            scraper = cloudscraper.create_scraper()
            proxies = {"http": entry, "https": entry}
            content = scraper.get(request.url, proxies=proxies).text
            return HtmlResponse(request.url, encoding="utf-8", body=content, request=request)

HtmlResponse是Scrapy自带的方法。