30秒左右下载完王者荣耀全部高清皮肤壁纸

476 阅读2分钟

今天我们一起来采集王者荣耀英雄的全部皮肤地址,目标网址:

pvp.qq.com/web201605/h…

image-20220412155028113

通过开发者工具发现 pvp.qq.com/web201605/j…

import requests

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.104 Safari/537.36'
}
herolist = requests.get(
    "https://pvp.qq.com/web201605/js/herolist.json", headers=headers).json()
herolist
[{'ename': 105,
  'cname': '廉颇',
  'title': '正义爆轰',
  'new_type': 0,
  'hero_type': 3,
  'skin_name': '正义爆轰|地狱岩魂'},
 {'ename': 106,
  'cname': '小乔',
  'title': '恋之微风',
  'new_type': 0,
  'hero_type': 2,
  'skin_name': '恋之微风|万圣前夜|天鹅之梦|纯白花嫁|缤纷独角兽'},
......

但是针对第一个英雄廉颇点开详情页后,可以看到网址为pvp.qq.com/web201605/h…

我们发现实际并不只有这两个皮肤,勇者和特殊活动的皮肤并不直接在json接口里显示:

image-20220412160220146

这意味着完整皮肤详情只能访问详情页查看,透过style属性可以看到图片地址,从而发现规律:

image-20220412160959732

而完整的皮肤名称列表可以通过上述ul的data-imgname属性获取,最终完整代码如下:

import requests
import os
import re
from concurrent.futures import ThreadPoolExecutor


def download_img(eid, name, i, skin_name):
    filename = f"王者荣耀壁纸/{eid:0>3}-{name}-{i:0>2}-{skin_name}.jpg"
    print(filename)
    if os.path.exists(filename):
        return
    img_url = f"http://game.gtimg.cn/images/yxzj/img201606/skin/hero-info/{eid}/{eid}-bigskin-{i}.jpg"
    res = requests.get(img_url)
    with open(filename, "wb") as f:
        f.write(res.content)


def download_hero_skin(hero):
    eid, name = hero["ename"], hero["cname"]
    res = requests.get(f"https://pvp.qq.com/web201605/herodetail/{eid}.shtml",
                       headers=headers)
    res.encoding = "gbk"
    skin_names = re.findall(
        '<ul[^>]+?data-imgname="([^"]+)"', res.text)[0].split("|")
    print(eid, name, skin_names)
    for i, skin_name in enumerate(skin_names, 1):
        end = skin_name.find("&")
        skin_name = skin_name[:len(skin_name) if end == -1 else end]
        download_img(eid, name, i, skin_name)


headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.104 Safari/537.36'
}
herolist = requests.get(
    "https://pvp.qq.com/web201605/js/herolist.json", headers=headers).json()
os.makedirs("王者荣耀壁纸", exist_ok=True)
with ThreadPoolExecutor(max_workers=16) as executor:
    executor.map(download_hero_skin, herolist)

经过30秒左右已经下载完毕:

image-20220412161632334

不到40行代码就实现了在30秒内下载完200多MB的王者荣耀壁纸。