爱奇艺好看电影排行榜爬取

464 阅读2分钟

「这是我参与2022首次更文挑战的第31天,活动详情查看:2022首次更文挑战」。

之前大家爬取豆瓣电影排行榜,今天我来爬取爱奇艺电影排行榜

一、安装unicodecsv

主要是爬取的数据需要写入excel文件中,用这个不会出错

二、查看分析页面获取json数据文件

三、抓取数据

四、写入excel文件

注意

  • 写入文件必须wb,二进制写入,不然会报错 character '\u2022'
  • 再次,编码必须注意,utf-8会乱码, gbk也不能完全包含,必须gb18030
  • f_csv = csv.writer(f, encoding='gb18030')
  • 最后抓取900条热门电影数据
# 依赖安装
!pip install unicodecsv
Looking in indexes: https://pypi.mirrors.ustc.edu.cn/simple/
Collecting unicodecsv
  Using cached https://mirrors.tuna.tsinghua.edu.cn/pypi/web/packages/6f/a4/691ab63b17505a26096608cc309960b5a6bdf39e4ba1a793d5f9b1a53270/unicodecsv-0.14.1.tar.gz
Building wheels for collected packages: unicodecsv
  Building wheel for unicodecsv (setup.py) ... [?25ldone
[?25h  Created wheel for unicodecsv: filename=unicodecsv-0.14.1-cp37-none-any.whl size=10768 sha256=6c63836f3a8574542de9f78b06326947fb504bf7a81dafcfa456afe211c808ab
  Stored in directory: /home/aistudio/.cache/pip/wheels/ba/92/fd/b41d4de2b71079ecb19c1819b0f0db179b9f3e2d24e052ba17
Successfully built unicodecsv
Installing collected packages: unicodecsv
Successfully installed unicodecsv-0.14.1

获取爱奇艺电影排行榜url

list.iqiyi.com/www/1/-----…

审查元素获取数据json地址

url = "pcw-api.iqiyi.com/search/reco…"

import requests
import json
import time
import unicodecsv as csv


# 获取接口
def getMoveinfo(url):
    session = requests.Session()
    headers = {
        "User-Agent": "Mozilla/5.0 (iPhone; CPU iPhone OS 11_0 like Mac OS X) AppleWebKit/604.1.38 (KHTML, like Gecko) Version/11.0 Mobile/15A372 Safari/604.1",
        "Accept": "application/json",
        "Referer": "http://list.iqiyi.com/www/1/-------------24-1-1-iqiyi--.html",
        "Origin": "http://list.iqiyi.com/",
        "Host": "pcw-api.iqiyi.com",
        "Connection": "keep-alive",
        "Accept-Language": "en-US,en;q=0.9,zh-CN;q=0.8,zh;q=0.7,zh-TW;q=0.6",
        "Accept-Encoding": "gzip, deflate"
    }
    response = session.get(url, headers=headers)
    if response.status_code == 200:
        return response.text
    return None
# 获取数据
def saveMovieInfoToFile(pageId):
    url = 'http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id='
    url += str(pageId)
    print(url)
    responseTxt = getMoveinfo(url)
    responseJson = json.loads(responseTxt)
    list = responseJson['data']['list']
    for val in list:
        item = []
        item.append(val['title'])
        item.append(val['period'])
        item.append(val['focus'])

        item.append(val['description'])
        item.append(val['playUrl'])
        item.append(val['imageUrl'])
        item.append(val['score'])
        arr.append(item)
# 开始爬取数据并保存到csv中

if __name__ == '__main__':
    num = 50
    arr = []
    for i in range(num):
        saveMovieInfoToFile(i + 1)
        time.sleep(0.5)
    title = ['电影名', '年份', '梗概', '简介', '观影地址', '海报', '评分']
    print(len(arr))
    print(arr[0])
    with open('爱奇艺电影排行榜.csv', 'wb') as f:
        f_csv = csv.writer(f, encoding='gb18030')
        f_csv.writerow(title)
        f_csv.writerows(arr)
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=1
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=2
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=3
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=4
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=5
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=6
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=7
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=8
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=9
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=10
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=11
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=12
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=13
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=14
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=15
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=16
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=17
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=18
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=19
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=20
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=21
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=22
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=23
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=24
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=25
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=26
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=27
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=28
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=29
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=30
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=31
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=32
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=33
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=34
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=35
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=36
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=37
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=38
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=39
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=40
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=41
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=42
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=43
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=44
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=45
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=46
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=47
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=48
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=49
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=50
900
['龙虎山张天师', '2020-06-05', '樊少皇破巫教谜团', '张道陵得好友文大人一封求助信,携弟子王长前往巴蜀,却发现好友身死,而自己被卷入了一个惊天阴谋中。张道陵坚持本心,不为外物所惑,最终阻止了当地邪恶组织残害百姓的邪恶祭祀,并教化了当地百姓。', 'http://www.iqiyi.com/v_19ry2hvngk.html', 'http://pic3.iqiyipic.com/image/20200605/ce/c4/v_149238124_m_601_m5.jpg', 8.1]

请点击此处查看本环境基本用法.
Please click here for more detailed instructions.