一、淘宝 API 接入基础
1.1 为什么选择淘宝 API
淘宝作为中国最大的电商平台之一,拥有海量的商品数据。通过淘宝 API,开发者可以:
- 获取实时商品价格、库存和详情
- 实现价格监控和比较功能
- 构建个性化推荐系统
- 开发电商数据分析工具
- 自动化执行采购决策
1.2 淘宝 API 的类型
淘宝开放平台提供了多种类型的 API 接口:
- 商品 API - 获取商品详情、搜索商品等
- 交易 API - 处理订单、支付等交易相关操作
- 用户 API - 获取用户信息、收货地址等
- 营销 API - 处理优惠券、促销活动等
- 数据 API - 获取行业数据、销售数据等
本文主要关注商品 API 的开发与接入。
1.3 接入准备
在开始接入淘宝 API 之前,需要完成以下准备工作:
- 注册开发者账号 - 访问注册并完成实名认证
- 创建应用 - 获取 ApiKey 和 ApiSecret
- 申请 API 权限 - 根据需要申请相应的 API 权限
- 获取授权令牌 - 通过 OAuth2.0 流程获取访问令牌
二、API 客户端设计
2.1 设计原则
设计淘宝 API 客户端时,应遵循以下原则:
- 安全性 - 妥善处理 ApiKey 和 ApiSecret,正确实现签名算法
- 可靠性 - 实现请求重试机制,处理网络异常
- 可维护性 - 模块化设计,清晰的接口定义
- 性能优化 - 合理控制请求频率,实现并发处理
2.2 核心功能实现
下面是一个完整的淘宝 API 客户端实现:
class TaobaoAPIClient:
def __init__(self, config: TaobaoAPIConfig):
self.config = config
self.app_key = config.get("app_key")
self.app_secret = config.get("app_secret")
self.base_url = config.get("base_url", "https://eco.taobao.com/router/rest")
self.access_token = config.get("access_token")
self.session = requests.Session()
self.session.headers.update({
"Content-Type": "application/x-www-form-urlencoded;charset=utf-8"
})
self.request_counter = 0
self.error_counter = 0
self.lock = threading.Lock()
def generate_sign(self, params: Dict[str, Any]) -> str:
# 排序参数
sorted_params = sorted(params.items(), key=lambda x: x[0])
# 拼接参数字符串
sign_str = self.app_secret
for k, v in sorted_params:
sign_str += f"{k}{v}"
sign_str += self.app_secret
# 计算MD5
return hashlib.md5(sign_str.encode('utf-8')).hexdigest().upper()
def call(self, method: str, params: Dict[str, Any], need_auth: bool = False,
retry_times: int = 3, timeout: int = 10) -> Dict[str, Any]:
for attempt in range(retry_times):
try:
# 构造请求参数
common_params = {
"app_key": self.app_key,
"format": "json",
"v": "2.0",
"sign_method": "md5",
"timestamp": datetime.now().strftime("%Y-%m-%d %H:%M:%S")
}
if need_auth and self.access_token:
common_params["session"] = self.access_token
all_params = {**common_params, **params, "method": method}
all_params["sign"] = self.generate_sign(all_params)
# 发送请求
response = self.session.post(self.base_url, data=all_params, timeout=timeout)
response.raise_for_status()
result = response.json()
# 更新请求计数器
with self.lock:
self.request_counter += 1
# 检查API返回是否有错误
if "error_response" in result:
error_code = result["error_response"].get("code")
error_msg = result["error_response"].get("msg")
logger.error(f"API返回错误: {error_code} - {error_msg}")
# 如果是令牌过期错误,尝试刷新令牌
if error_code in ["40", "41"]:
self.refresh_token()
if attempt < retry_times - 1:
logger.info("令牌已刷新,正在重试请求...")
time.sleep(1)
continue
return {"error": f"API错误: {error_code} - {error_msg}", "result": result}
logger.info(f"API调用成功: {method}")
return result
except requests.exceptions.RequestException as e:
with self.lock:
self.error_counter += 1
logger.error(f"API请求异常 (尝试 {attempt+1}/{retry_times}): {str(e)}")
if attempt < retry_times - 1:
wait_time = (2 ** attempt) * 0.5 # 指数退避
logger.info(f"将在 {wait_time} 秒后重试...")
time.sleep(wait_time)
else:
return {"error": f"请求失败: {str(e)}"}
2.3 配置管理
使用单例模式管理 API 配置:
class TaobaoAPIConfig(metaclass=SingletonMeta):
def __init__(self):
self.config = {}
def load_config(self, config_file: str) -> None:
try:
with open(config_file, 'r', encoding='utf-8') as f:
self.config = json.load(f)
logger.info(f"成功加载配置文件: {config_file}")
except Exception as e:
logger.error(f"加载配置文件失败: {str(e)}")
self.config = {}
def get(self, key: str, default: Any = None) -> Any:
return self.config.get(key, default)
配置文件示例(config.json):
{
"app_key": "your_app_key",
"app_secret": "your_app_secret",
"access_token": "your_access_token",
"base_url": "https://eco.taobao.com/router/rest"
}
三、数据采集与处理
3.1 商品数据采集器
设计一个专门的商品数据采集器,负责与 API 交互并获取商品数据:
class TaobaoItemCollector:
def __init__(self, api_client: TaobaoAPIClient,
data_processor: ItemDataProcessor,
data_storage: DataStorage):
self.api_client = api_client
self.data_processor = data_processor
self.data_storage = data_storage
self.max_workers = 5 # 最大线程数
self.throttle_delay = 1 # 请求间隔(秒)
self.last_request_time = 0
def throttle(self) -> None:
"""限制请求频率"""
current_time = time.time()
elapsed = current_time - self.last_request_time
if elapsed < self.throttle_delay:
wait_time = self.throttle_delay - elapsed
logger.debug(f"请求频率限制: 等待 {wait_time:.2f} 秒")
time.sleep(wait_time)
self.last_request_time = time.time()
def search_items(self, keyword: str, page_no: int = 1, page_size: int = 20) -> List[Dict[str, Any]]:
"""搜索商品"""
self.throttle()
params = {
"q": keyword,
"page_no": page_no,
"page_size": page_size,
"fields": "num_iid,title,price,num,pic_url,seller_nick,volume"
}
result = self.api_client.call("taobao.tbk.item.get", params, need_auth=False)
if "tbk_item_get_response" in result and "results" in result["tbk_item_get_response"]:
return result["tbk_item_get_response"]["results"].get("n_tbk_item", [])
else:
logger.error(f"搜索商品失败: {result}")
return []
3.2 数据处理与存储
设计数据处理器和存储接口,实现数据的处理和持久化:
class ItemDataProcessor:
def process_item_data(self, item_data: Dict[str, Any]) -> Dict[str, Any]:
processed_data = {}
# 提取基本信息
processed_data["id"] = item_data.get("num_iid")
processed_data["title"] = item_data.get("title")
processed_data["price"] = float(item_data.get("price", 0))
processed_data["original_price"] = float(item_data.get("original_price", processed_data["price"]))
processed_data["stock"] = int(item_data.get("num", 0))
processed_data["sales_volume"] = int(item_data.get("volume", 0))
# 计算折扣
if processed_data["original_price"] > 0:
processed_data["discount"] = round(processed_data["price"] / processed_data["original_price"], 2)
else:
processed_data["discount"] = 1.0
# 提取图片信息
if "item_imgs" in item_data and "item_img" in item_data["item_imgs"]:
processed_data["images"] = [img.get("url") for img in item_data["item_imgs"]["item_img"]]
else:
processed_data["images"] = []
# 提取属性信息
if "props_name" in item_data:
processed_data["properties"] = self._parse_properties(item_data["props_name"])
else:
processed_data["properties"] = {}
return processed_data
数据存储接口和实现:
class DataStorage:
def save_item(self, item_data: Dict[str, Any]) -> bool:
raise NotImplementedError
def batch_save_items(self, items_data: List[Dict[str, Any]]) -> int:
raise NotImplementedError
class JSONFileStorage(DataStorage):
def __init__(self, file_path: str):
self.file_path = file_path
self.lock = threading.Lock()
def save_item(self, item_data: Dict[str, Any]) -> bool:
try:
with self.lock:
try:
with open(self.file_path, 'r', encoding='utf-8') as f:
existing_data = json.load(f)
except (FileNotFoundError, json.JSONDecodeError):
existing_data = []
existing_data.append(item_data)
with open(self.file_path, 'w', encoding='utf-8') as f:
json.dump(existing_data, f, ensure_ascii=False, indent=2)
return True
except Exception as e:
logger.error(f"保存商品数据失败: {str(e)}")
return False
class DatabaseStorage(DataStorage):
def __init__(self, db_config: Dict[str, Any]):
self.db_config = db_config
self.connection = None
self.connect()
def connect(self) -> bool:
try:
import sqlite3
db_path = self.db_config.get("db_path", "taobao_data.db")
self.connection = sqlite3.connect(db_path)
self._create_tables()
logger.info(f"成功连接到数据库: {db_path}")
return True
except Exception as e:
logger.error(f"连接数据库失败: {str(e)}")
return False
def _create_tables(self) -> None:
if not self.connection:
return
cursor = self.connection.cursor()
# 创建商品表
cursor.execute('''
CREATE TABLE IF NOT EXISTS items (
id TEXT PRIMARY KEY,
title TEXT,
price REAL,
original_price REAL,
discount REAL,
stock INTEGER,
sales_volume INTEGER,
create_time TIMESTAMP DEFAULT CURRENT_TIMESTAMP
)
''')
# 创建商品图片表
cursor.execute('''
CREATE TABLE IF NOT EXISTS item_images (
item_id TEXT,
image_url TEXT,
FOREIGN KEY (item_id) REFERENCES items (id)
)
''')
# 创建商品属性表
cursor.execute('''
CREATE TABLE IF NOT EXISTS item_properties (
item_id TEXT,
property_name TEXT,
property_value TEXT,
FOREIGN KEY (item_id) REFERENCES items (id)
)
''')
self.connection.commit()
def save_item(self, item_data: Dict[str, Any]) -> bool:
if not self.connection:
if not self.connect():
return False
try:
cursor = self.connection.cursor()
item_id = item_data.get("id")
if not item_id:
logger.error("商品ID缺失,无法保存")
return False
cursor.execute(
'''
INSERT OR REPLACE INTO items (id, title, price, original_price, discount, stock, sales_volume)
VALUES (?, ?, ?, ?, ?, ?, ?)
''',
(
item_id,
item_data.get("title"),
item_data.get("price"),
item_data.get("original_price"),
item_data.get("discount"),
item_data.get("stock"),
item_data.get("sales_volume")
)
)
# 插入商品图片
if "images" in item_data:
for image_url in item_data["images"]:
cursor.execute(
'''
INSERT INTO item_images (item_id, image_url)
VALUES (?, ?)
''',
(item_id, image_url)
)
# 插入商品属性
if "properties" in item_data:
for prop_name, prop_value in item_data["properties"].items():
cursor.execute(
'''
INSERT INTO item_properties (item_id, property_name, property_value)
VALUES (?, ?, ?)
''',
(item_id, prop_name, prop_value)
)
self.connection.commit()
return True
except Exception as e:
self.connection.rollback()
logger.error(f"保存商品数据到数据库失败: {str(e)}")
return False
四、性能优化与最佳实践
4.1 请求频率控制
实现请求频率控制,避免被 API 限流:
def throttle(self) -> None:
"""限制请求频率"""
current_time = time.time()
elapsed = current_time - self.last_request_time
if elapsed < self.throttle_delay:
wait_time = self.throttle_delay - elapsed
logger.debug(f"请求频率限制: 等待 {wait_time:.2f} 秒")
time.sleep(wait_time)
self.last_request_time = time.time()
4.3 异常处理与日志记录
完善的异常处理和日志记录是保证系统稳定性的关键:
def call(self, method: str, params: Dict[str, Any], need_auth: bool = False,
retry_times: int = 3, timeout: int = 10) -> Dict[str, Any]:
for attempt in range(retry_times):
try:
# 构造并发送请求...
response = self.session.post(self.base_url, data=all_params, timeout=timeout)
response.raise_for_status()
result = response.json()
# 处理响应...
except requests.exceptions.RequestException as e:
with self.lock:
self.error_counter += 1
logger.error(f"API请求异常 (尝试 {attempt+1}/{retry_times}): {str(e)}")
if attempt < retry_times - 1:
wait_time = (2 ** attempt) * 0.5 # 指数退避
logger.info(f"将在 {wait_time} 秒后重试...")
time.sleep(wait_time)
else:
return {"error": f"请求失败: {str(e)}"}
五、使用示例
下面是一个完整的使用示例,展示如何配置和使用这个 API 客户端:
if __name__ == "__main__":
# 加载配置
config = TaobaoAPIConfig()
config.load_config("config.json")
# 初始化API客户端
api_client = TaobaoAPIClient(config)
# 初始化数据处理器
data_processor = ItemDataProcessor()
# 初始化数据存储
data_storage = JSONFileStorage("taobao_items.json")
# 初始化商品数据采集器
collector = TaobaoItemCollector(api_client, data_processor, data_storage)
# 设置采集参数
keywords = ["手机", "笔记本电脑", "平板电脑"]
max_pages = 3
use_parallel = True
# 采集商品数据
total_count = 0
for keyword in keywords:
count = collector.collect_items_by_keyword(
keyword=keyword,
max_pages=max_pages,
parallel=use_parallel
)
total_count += count
logger.info(f"关键词 '{keyword}' 共采集到 {count} 个商品")
logger.info(f"所有关键词采集完成,共采集到 {total_count} 个商品")
# 输出API调用统计
stats = api_client.get_stats()
print(f"API调用统计: 请求总数={stats['request_count']}, 错误数={stats['error_count']}")
六、总结
通过本文介绍的最佳实践,你可以高效地接入淘宝 API 并获取商品数据:
- 设计安全可靠的 API 客户端
- 实现数据采集、处理和存储的模块化架构
- 通过并行处理提高数据采集效率
- 合理控制请求频率,避免被限流
- 完善异常处理和日志记录,确保系统稳定性
在下一篇文章中,我们将进一步探讨如何使用采集到的数据构建价格监控系统和数据分析工具。