Scrapy模拟登录

200 阅读1分钟

为什么需要模拟登录?

获取cookie,能够爬取登录后的页面

request是如何模拟登录的?

  1. 直接携带cookies请求页面
  2. 找接口发送post请求存储cookie

selenium是如何模拟登录的?

  • 找到对应的input标签,输入文字点击登录

对于scrapy来说,也是有两个方法模拟登录:

  • 直接携带cookie
  • 找到发送post请求的URL地址,带上信息,发送请求

模拟登录

登录网址: login2.scrape.center/login

class Login2Spider(scrapy.Spider):  
    name = "login2"  
    allowed_domains = ["scrape.center"]  
    start_urls = ["https://login2.scrape.center/"]  

    def start_requests(self):  
        cookies = "sessionid=bolt8e0i1f1xe4lcatvhhecjy9bq8l9l"  
        key,value = cookies.split('=',1)  
        cookie = {key:value}  

        yield scrapy.Request(  
        url=self.start_urls[0],  
        callback=self.parse,  
        cookies=cookie  
        )  

    def parse(self, response):  
        with open("login2.html", 'w', encoding='utf-8') as f:  
        f.write(response.body.decode())

cookies携带有登录信息,所以我们在直接请求需要的页面数据时可以使用cookies

发送POST请求登录

import scrapy
from cuiqingcai import settings


class Ssr2PostSpider(scrapy.Spider):
    name = "ssr2_post"
    allowed_domains = ["scrape.center"]
    start_urls = ["https://login2.scrape.center/login"]

    def parse(self, response):
        post_data = {
            'username': settings.SSR2_USERNAME,
            'password': settings.SSR2_PASSWORD
        }

        yield scrapy.FormRequest(
            url='https://login2.scrape.center/login',
            formdata=post_data,
            callback=self.after_login
        )

    def after_login(self, response):
        print(response)
        with open('ssr2_post.html','w',encoding='UTF-8') as f:
            f.write(response.text)

Scrapy之模拟自动登录

import scrapy
from cuiqingcai import settings


class Ssr2Post2Spider(scrapy.Spider):
    name = "ssr2_post2"
    allowed_domains = ["scrape.center"]
    start_urls = ["https://login2.scrape.center/login"]

    def parse(self, response):
        yield scrapy.FormRequest.from_response(
            response=response,
            formdata={
                'username': settings.SSR2_USERNAME,
                'password': settings.SSR2_PASSWORD
            },
            callback=self.after_login
        )


    def after_login(self,response):
        with open('./ssr2_post2.html', 'w',encoding='UTF-8') as f:
            f.write(response.body.decode())