Python - ajax的get请求爬取方法豆瓣电影 AJAX GET 请求爬虫说明目标URL：定义爬取的豆瓣电影

豆瓣电影 AJAX GET 请求爬虫


import urllib.request


# 爬取目标URL
url = 'https://movie.douban.com/j/chart/top_list?type=5&interval_id=100%3A90&action=&start=0&limit=20'

# 请求头设置，模拟浏览器访问
headers = {
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36'
}

# 1、定制请求
request = urllib.request.Request(url=url, headers=headers)

# 2、发送请求并获取响应
response = urllib.request.urlopen(request)
content = response.read().decode('utf-8')  # 解码为字符串
print(content)  # 输出内容

# 3、数据保存到本地
# 由于文件默认编码为gbk，为避免中文乱码，需指定编码为utf-8
with open('douban.json', 'w', encoding='utf-8') as fp:
    fp.write(content)

说明

目标URL：定义爬取的豆瓣电影页面的JSON数据API地址，并通过GET请求获取电影列表。
请求头设置：为了模拟正常的浏览器请求，使用带有 User-Agent 的请求头，避免被服务器识别为爬虫而限制访问。
定制请求与发送请求：使用 urllib.request.Request 配置请求的URL和Headers，并用 urllib.request.urlopen 发送请求。
数据解码与输出：使用 read().decode('utf-8') 将数据解码为字符串格式，方便直接输出查看结果。
本地存储：使用 with open 方法保存内容到JSON文件中，并指定编码为 utf-8，确保文件能够正确保存中文字符。