Python如何获取cookie

956 阅读1分钟

本文主要介绍不需要登录的网站,如何获取cookies,方法参考的 GitHub

举个例子,当我们请求 这个网站,虽然她不用登录,但是需要cookie才能获取到网页数据

很多时候我们会用session.cookies这个方法来获取cookie,如下所示:

from urllib.parse import urljoin
from lxml import etree
import browser_cookie3
import requests
from fake_useragent import UserAgent

def get_cookie():
    ua = UserAgent().ie  # 随机获取请求头
    headers = {'User-Agent': ua}
    url = 'http://www.ccgp-gansu.gov.cn/web/article/doSearchyxgkAll.action?limit=20&start=0'
    session = requests.session()
    session.post(url, headers=headers)
    cookie = session.cookies
    a = cookie.get_dict()
    print(a)

   #{'4hP44ZykCTt5S':'5J4Cb6gbg4WsmSHChuBQDRCOVFH2zkEkGhrAefxiBMr0S08miQLF8qrpGrZUPbxq_WGySfpfPzOOZzU8fKrgDxA'}
   
get_cookie()

这种方法看起来很方便,但是有个弊端,获取到的cookie是只有一部分,并非完整的cookie,因此拿这个值去请求也是无效的。因此我们需要换一种方法:使用browser_cookie3 获取cookie,如下所示

from urllib.parse import urljoin
from lxml import etree
import browser_cookie3
import requests
from fake_useragent import UserAgent


def get_cookie():
    url = 'http://www.ccgp-gansu.gov.cn/web/article/doSearchyxgkAll.action?limit=20&start=0'
    cookies = browser_cookie3.chrome(domain_name='ccgp-gansu.gov.cn')
    resp = requests.get(url=url, cookies=cookies)
    text = resp.content.decode('utf-8')
    tree = etree.HTML(text)
    li_list = tree.xpath('//tbody/tr')

    for li in li_list:
        item = {}
        item['name'] = '采购网'
        item['title'] = title = li.xpath('./td[1]/a/text()')[0]
        item['date'] = li.xpath('./td[4]/text()')[0]
        item['url'] = urljoin(url, li.xpath('./td[1]/a/@href')[0])
        print(item)

#<CookieJar[<Cookie 4hP44ZykCTt5S=5RkyPGd0CzmY28nFDUGVUQGpQT_dvwG7BI.QXg2uclQgJoPLBeyUtXfU1vBZWgc_EEcIE5K3ak2Ht9QjWZ65TCq for www.ccgp-gansu.gov.cn/>, <Cookie 4hP44ZykCTt5T=iKSrLMJ_051i7jqEUqA1aSs7H1OFOHLNxCTTnkH8U5_ysZ6MmEb2T1G_Z9ol91UcS3laQSGf_sJr7pkc.UEntmJT5GYMGRpu0ndUw__grEYnPiHnPD4Te69qCtPtsZw4md9yEVsRlNN4jAUAoW84wvxDKQKXKenl4l.VH18anP3HXY18U2omlI9txsWDDh2GTAAt9n.2gvUqjFrcFvACzEXYrP_CMRManLbaQmEcwRrgbGYiVO.rQ0D88iC1JHtX5kKEyuCXlTpqiJe2IC00OOIaLKJOYkf52bnShwyZ_JO8XVk3NbPkzqeRAMuUyddcSnbjYnqiVyix0D8.szAW.JTrOftDecg9Em9cbyZKIG3 for www.ccgp-gansu.gov.cn/>, <Cookie JSESSIONID=CE4AC0EAB4049552BB64EFE9597AA94D.tomcat1 for www.ccgp-gansu.gov.cn/>]>


get_cookie()

通过这种方式获取到的cookie是非常完整的,并且不用指定具体的url,只用domain_name就可以了