以下是爬取淘宝价格信息及接入淘宝比价 API 的一般步骤:
-
传统爬虫方式获取价格信息(不建议大量使用,可能违反淘宝规定) :
- 分析目标页面 URL:在淘宝搜索框输入关键词后,观察页面的 URL 结构。例如搜索 “手机”,起始页的 URL 类似于
https://s.taobao.com/search?q=手机&js=1&stats_click=search_radio_all%3A1&initiative_id=staobaoz_20XXXXX&ie=utf8,后续翻页的 URL 会有参数变化,如第二页可能是https://s.taobao.com/search?q=手机&js=1&stats_click=search_radio_all%3A1&initiative_id=staobaoz_20XXXXX&ie=utf8&bcoffset=X&ntoffset=X&p4ppushleft=1%2C48&s=44(其中X为特定偏移值,s的值一般为每页商品数量的倍数,这里是 44 的倍数)。 - 设置请求头:淘宝有反爬虫机制,需要设置请求头来模拟浏览器访问。请求头中至少包含
User-Agent(用户代理)和Cookie。User-Agent可以通过浏览器的开发者工具获取,Cookie可以在登录淘宝后从浏览器中获取。 - 发送请求并获取页面内容:使用 Python 的
requests库发送 HTTP 请求,获取淘宝搜索页面的 HTML 内容。示例代码如下:
- 分析目标页面 URL:在淘宝搜索框输入关键词后,观察页面的 URL 结构。例如搜索 “手机”,起始页的 URL 类似于
import requests
def get_html_text(url, headers):
try:
r = requests.get(url, headers=headers, timeout=30)
r.raise_for_status()
r.encoding = r.apparent_encoding
return r.text
except:
return ""
解析页面提取价格信息:使用正则表达式或解析库(如BeautifulSoup)从 HTML 页面中提取商品的价格信息。商品价格的信息通常在特定的 HTML 标签或属性中,比如view_price属性可能包含价格值。示例代码(使用正则表达式):
import re
def parse_page(html):
try:
# 提取页面中商品的价格
plt = re.findall(r'"view_price":"(\d+.)*\d+"', html)
# 这里可以根据需求进一步处理价格数据
return plt
except:
print("解析页面价格信息出错")
return []
2.使用淘宝比价 API(推荐的正规方式) 7:
- 注册成为淘宝开放平台开发者:访问淘宝api文档,按照平台的要求进行注册和登录,成为开发者。
- 创建应用并获取 Api Key 和 Api Secret:在开放平台上创建应用,申请获取 Api Key 和 Api Secret。这两个参数是调用 API 的重要凭证,需要妥善保管。
- 了解 API 的限制和配额:淘宝开放平台对 API 的使用有一定的限制和配额规定,例如请求频率、每日请求次数等。在使用 API 之前,务必了解这些限制,以避免因违规使用而被限制访问7。
- 构造请求并发送:根据淘宝比价 API 的文档要求,构造 HTTP 请求。一般来说,需要使用
GET或POST方法向指定的 API 地址发送请求,并在请求中包含 Api Key、Api Secret 以及其他必要的参数(如商品关键词、页码等)。以下是一个简单的示例(使用requests库发送请求):
# coding:utf-8
"""
Compatible for python2.x and python3.x
requirement: pip install requests
"""
from __future__ import print_function
import requests
# 请求示例 url 默认请求参数已经做URL编码
url = "https://taobao/item_search/?key=<您自己的apiKey>&secret=<您自己的apiSecret>&q=女装&start_price=0&end_price=0&page=1&cat=0&discount_only=&sort=&page_size=&seller_info=&nick=&ppath=&imgid=&filter="
headers = {
"Accept-Encoding": "gzip",
"Connection": "close"
}
if __name__ == "__main__":
r = requests.get(url, headers=headers)
json_obj = r.json()
print(json_obj)
解析响应数据:API 接口返回的数据格式通常是 JSON,需要使用 Python 的json模块对响应数据进行解析,提取出商品的价格等信息。示例代码:
{
"items": {
"page": "1",
"real_total_results": "360000",
"total_results": "360000",
"page_size": 10,
"pagecount": "200",
"_ddf": "szx",
"item": [
{
"title": "高腰百搭羊羔绒阔腿裤冬季保暖",
"pic_url": "https://img.alicdn.com/img/bao/uploaded/i4/O1CN01TpegE82EAUGWQD2JI_!!0-item_pic.jpg",
"promotion_price": "50.00",
"orginal_price": "50.00",
"price": "50.00",
"sales": 0,
"num_iid": "756775095301",
"seller_id": "1704328704",
"detail_url": "https://item.taobao.com/item.htm?id=756775095301"
},
{
"title": "德绒圆领打底衫短款修身内搭长袖T恤女秋冬纯色弧形小众百搭上衣",
"pic_url": "https://img.alicdn.com/img/bao/uploaded/i4/O1CN01xuzAuZ2LY21GBhlfE_!!3937219703-0-C2M.jpg",
"promotion_price": "44.40",
"orginal_price": "44.40",
"price": "44.40",
"sales": 0,
"num_iid": "825927041956",
"seller_id": "3937219703",
"detail_url": "https://item.taobao.com/item.htm?id=825927041956"
},
{
"title": "美式辣妹螺纹修身女打底衫长袖",
"pic_url": "https://img.alicdn.com/img/bao/uploaded/i4/O1CN010d2Jxj1ZOJwsXSEHg_!!862293184.jpg",
"promotion_price": "58.00",
"orginal_price": "58.00",
"price": "58.00",
"sales": 0,
"num_iid": "735775262176",
"seller_id": "862293184",
"detail_url": "https://item.taobao.com/item.htm?id=735775262176"
},
{
"title": "2024年新款高腰复古牛仔裤女秋季",
"pic_url": "https://img.alicdn.com/img/bao/uploaded/i4/O1CN01YfwUGX1h1tbd5rczJ_!!2206584264218-0-C2M.jpg",
"promotion_price": "59.85",
"orginal_price": "59.85",
"price": "59.85",
"sales": 0,
"num_iid": "837243885999",
"seller_id": "2206584264218",
"detail_url": "https://item.taobao.com/item.htm?id=837243885999"
},
{
"title": "【秋冬绝美可单穿慵懒小上衣】",
"pic_url": "https://img.alicdn.com/img/bao/uploaded/i4/O1CN01m6HRZH1zxkhIMd3OG_!!0-item_pic.jpg",
"promotion_price": "57.00",
"orginal_price": "57.00",
"price": "57.00",
"sales": 0,
"num_iid": "818387607863",
"seller_id": "2209434566781",
"detail_url": "https://item.taobao.com/item.htm?id=818387607863"
},
{
"title": "秋季新款复古高腰阔腿牛仔裤女宽松显瘦垂感拖地直筒长裤水洗做旧",
"pic_url": "https://img.alicdn.com/img/bao/uploaded/i4/O1CN01aJeIvR1mQEi3d7tnq_!!2206588314948-2-C2M.jpg",
"promotion_price": "74.85",
"orginal_price": "74.85",
"price": "74.85",
"sales": 0,
"num_iid": "819491929229",
"seller_id": "2206588314948",
"detail_url": "https://item.taobao.com/item.htm?id=819491929229"
},
{
"title": "美式复古豹纹牛仔裤女男夏季潮牌小众阔腿裤宽松垂感休闲长裤腰带",
"pic_url": "https://img.alicdn.com/img/bao/uploaded/i4/O1CN01fQC8Vc2LY21fSyrpo_!!3937219703-0-C2M.jpg",
"promotion_price": "55.80",
"orginal_price": "55.80",
"price": "55.80",
"sales": 0,
"num_iid": "797999211515",
"seller_id": "3937219703",
"detail_url": "https://item.taobao.com/item.htm?id=797999211515"
},
{
"title": "u领冬季白色内搭长袖秋季上衣女秋冬打底衫短款秋衣春秋修身t恤",
"pic_url": "https://img.alicdn.com/img/bao/uploaded/i4/O1CN01XfN4ys1I8Qr51Xkzq_!!0-item_pic.jpg",
"promotion_price": "47.75",
"orginal_price": "47.75",
"price": "47.75",
"sales": 0,
"num_iid": "835469595962",
"seller_id": "2216045650848",
"detail_url": "https://item.taobao.com/item.htm?id=835469595962"
},
{
"title": "【李佳锜推荐】高领内搭打底衫女",
"pic_url": "https://img.alicdn.com/img/bao/uploaded/i4/O1CN01X1jpKO1Xzl0WvcLqA_!!0-item_pic.jpg",
"promotion_price": "59.00",
"orginal_price": "59.00",
"price": "59.00",
"sales": 0,
"num_iid": "690366689843",
"seller_id": "3855402995",
"detail_url": "https://item.taobao.com/item.htm?id=690366689843"
},
{
"title": "高腰休闲宽松条纹运动裤",
"pic_url": "https://img.alicdn.com/img/bao/uploaded/i4/O1CN01yL3DXE1ToWxdKP8UP_!!0-item_pic.jpg",
"promotion_price": "42.78",
"orginal_price": "42.78",
"price": "42.78",
"sales": 0,
"num_iid": "841863957500",
"seller_id": "2132122429",
"detail_url": "https://item.taobao.com/item.htm?id=841863957500"
}
],
"item_weight_update": 0
},
"error_code": "0000"
使用淘宝比价 API 是合法且安全的方式,但在使用过程中要严格遵守淘宝开放平台的规定和使用条款。同时,对于传统爬虫方式,要谨慎使用,避免对淘宝网站造成过大的访问压力或违反相关规定。