为什么需要模拟登录?
获取cookie,能够爬取登录后的页面
request是如何模拟登录的?
- 直接携带cookies请求页面
- 找接口发送post请求存储cookie
selenium是如何模拟登录的?
- 找到对应的input标签,输入文字点击登录
对于scrapy来说,也是有两个方法模拟登录:
- 直接携带cookie
- 找到发送post请求的URL地址,带上信息,发送请求
模拟登录
登录网址: login2.scrape.center/login
class Login2Spider(scrapy.Spider):
name = "login2"
allowed_domains = ["scrape.center"]
start_urls = ["https://login2.scrape.center/"]
def start_requests(self):
cookies = "sessionid=bolt8e0i1f1xe4lcatvhhecjy9bq8l9l"
key,value = cookies.split('=',1)
cookie = {key:value}
yield scrapy.Request(
url=self.start_urls[0],
callback=self.parse,
cookies=cookie
)
def parse(self, response):
with open("login2.html", 'w', encoding='utf-8') as f:
f.write(response.body.decode())
cookies携带有登录信息,所以我们在直接请求需要的页面数据时可以使用cookies
发送POST请求登录
import scrapy
from cuiqingcai import settings
class Ssr2PostSpider(scrapy.Spider):
name = "ssr2_post"
allowed_domains = ["scrape.center"]
start_urls = ["https://login2.scrape.center/login"]
def parse(self, response):
post_data = {
'username': settings.SSR2_USERNAME,
'password': settings.SSR2_PASSWORD
}
yield scrapy.FormRequest(
url='https://login2.scrape.center/login',
formdata=post_data,
callback=self.after_login
)
def after_login(self, response):
print(response)
with open('ssr2_post.html','w',encoding='UTF-8') as f:
f.write(response.text)
Scrapy之模拟自动登录
import scrapy
from cuiqingcai import settings
class Ssr2Post2Spider(scrapy.Spider):
name = "ssr2_post2"
allowed_domains = ["scrape.center"]
start_urls = ["https://login2.scrape.center/login"]
def parse(self, response):
yield scrapy.FormRequest.from_response(
response=response,
formdata={
'username': settings.SSR2_USERNAME,
'password': settings.SSR2_PASSWORD
},
callback=self.after_login
)
def after_login(self,response):
with open('./ssr2_post2.html', 'w',encoding='UTF-8') as f:
f.write(response.body.decode())