「这是我参与2022首次更文挑战的第31天,活动详情查看:2022首次更文挑战」。
之前大家爬取豆瓣电影排行榜,今天我来爬取爱奇艺电影排行榜
一、安装unicodecsv
主要是爬取的数据需要写入excel文件中,用这个不会出错
二、查看分析页面获取json数据文件
三、抓取数据
四、写入excel文件
注意
- 写入文件必须wb,二进制写入,不然会报错 character '\u2022'
- 再次,编码必须注意,utf-8会乱码, gbk也不能完全包含,必须gb18030
- f_csv = csv.writer(f, encoding='gb18030')
- 最后抓取900条热门电影数据
# 依赖安装
!pip install unicodecsv
Looking in indexes: https://pypi.mirrors.ustc.edu.cn/simple/
Collecting unicodecsv
Using cached https://mirrors.tuna.tsinghua.edu.cn/pypi/web/packages/6f/a4/691ab63b17505a26096608cc309960b5a6bdf39e4ba1a793d5f9b1a53270/unicodecsv-0.14.1.tar.gz
Building wheels for collected packages: unicodecsv
Building wheel for unicodecsv (setup.py) ... [?25ldone
[?25h Created wheel for unicodecsv: filename=unicodecsv-0.14.1-cp37-none-any.whl size=10768 sha256=6c63836f3a8574542de9f78b06326947fb504bf7a81dafcfa456afe211c808ab
Stored in directory: /home/aistudio/.cache/pip/wheels/ba/92/fd/b41d4de2b71079ecb19c1819b0f0db179b9f3e2d24e052ba17
Successfully built unicodecsv
Installing collected packages: unicodecsv
Successfully installed unicodecsv-0.14.1
获取爱奇艺电影排行榜url
审查元素获取数据json地址
url = "pcw-api.iqiyi.com/search/reco…"
import requests
import json
import time
import unicodecsv as csv
# 获取接口
def getMoveinfo(url):
session = requests.Session()
headers = {
"User-Agent": "Mozilla/5.0 (iPhone; CPU iPhone OS 11_0 like Mac OS X) AppleWebKit/604.1.38 (KHTML, like Gecko) Version/11.0 Mobile/15A372 Safari/604.1",
"Accept": "application/json",
"Referer": "http://list.iqiyi.com/www/1/-------------24-1-1-iqiyi--.html",
"Origin": "http://list.iqiyi.com/",
"Host": "pcw-api.iqiyi.com",
"Connection": "keep-alive",
"Accept-Language": "en-US,en;q=0.9,zh-CN;q=0.8,zh;q=0.7,zh-TW;q=0.6",
"Accept-Encoding": "gzip, deflate"
}
response = session.get(url, headers=headers)
if response.status_code == 200:
return response.text
return None
# 获取数据
def saveMovieInfoToFile(pageId):
url = 'http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id='
url += str(pageId)
print(url)
responseTxt = getMoveinfo(url)
responseJson = json.loads(responseTxt)
list = responseJson['data']['list']
for val in list:
item = []
item.append(val['title'])
item.append(val['period'])
item.append(val['focus'])
item.append(val['description'])
item.append(val['playUrl'])
item.append(val['imageUrl'])
item.append(val['score'])
arr.append(item)
# 开始爬取数据并保存到csv中
if __name__ == '__main__':
num = 50
arr = []
for i in range(num):
saveMovieInfoToFile(i + 1)
time.sleep(0.5)
title = ['电影名', '年份', '梗概', '简介', '观影地址', '海报', '评分']
print(len(arr))
print(arr[0])
with open('爱奇艺电影排行榜.csv', 'wb') as f:
f_csv = csv.writer(f, encoding='gb18030')
f_csv.writerow(title)
f_csv.writerows(arr)
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=1
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=2
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=3
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=4
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=5
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=6
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=7
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=8
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=9
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=10
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=11
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=12
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=13
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=14
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=15
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=16
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=17
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=18
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=19
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=20
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=21
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=22
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=23
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=24
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=25
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=26
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=27
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=28
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=29
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=30
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=31
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=32
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=33
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=34
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=35
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=36
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=37
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=38
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=39
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=40
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=41
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=42
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=43
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=44
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=45
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=46
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=47
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=48
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=49
http://pcw-api.iqiyi.com/search/recommend/list?channel_id=1&data_type=1&mode=24&page_id=50
900
['龙虎山张天师', '2020-06-05', '樊少皇破巫教谜团', '张道陵得好友文大人一封求助信,携弟子王长前往巴蜀,却发现好友身死,而自己被卷入了一个惊天阴谋中。张道陵坚持本心,不为外物所惑,最终阻止了当地邪恶组织残害百姓的邪恶祭祀,并教化了当地百姓。', 'http://www.iqiyi.com/v_19ry2hvngk.html', 'http://pic3.iqiyipic.com/image/20200605/ce/c4/v_149238124_m_601_m5.jpg', 8.1]
请点击此处查看本环境基本用法.
Please click here for more detailed instructions.