前言
使用python、nodejs等语言进行爬虫时,请求网站时返回了418,故此记录一下。
原因
目标网站有反爬虫机制,如果没有正确的请求头信息,会以418状态码返回响应。
418状态码
被目标网站反爬程序检测返回的状态码。
英文解释:418 I’m a teapot。The HTTP 418 I’m a teapot client error response code indicates that the server refuses to brew coffee because it is a teapot. This error is a reference to Hyper Text Coffee Pot Control Protocol which was an April Fools’ joke in 1998.
解决方法
添加请求头信息
def request_douban(url):
try:
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36'}
response = requests.get(url, headers=headers)
print(response)
if response.status_code == 200:
return response.text
except requests.RequestException:
return None