js逆向(一)

364 阅读7分钟

js反爬

js反爬指的是爬虫在获取网页数据时,遇到通过JavaScript代码实现的反爬虫措施。js反爬技术的实现方式包括动态渲染、异步加载、验证码、IP限制等多种方式。这些技术可以有效地防止爬虫的抓取,有助于保护网站的数据安全。

js逆向

如果想要破解js反爬手段,就需要进行js逆向。js逆向需要具备一定的JavaScript编程能力和代码分析能力,对于爬虫开发者来说是一项高级的技能。

逆向流程

  1. 分析网站:分析需要逆向的目标网站,确定是否存在影响数据获取的参数;
  2. 寻找加密数据:通过搜索、调试断点、hook等手段,手动找到参数的加密位置;
  3. 代码移植:将找到的加密算法或者逻辑移植到自己的代码中,这可能涉及到解码、解密、算法还原等复杂操作。同时要注意应对代码混淆、算法修改等阻碍逆向的手段;
  4. 代码优化:对逆向后的代码进行优化,以确保爬虫程序能够稳定运行,并及时适配反爬措施的更新。

需要注意的是,js逆向是一项复杂且高级的技术,需要具备扎实的JavaScript编程能力,对代码分析和算法理解的能力,以及耐心和毅力来应对可能遇到的各种挑战。

js加密算法

进⾏翻⻚或者新的数据获取
请求体参数是会变化的,⽽且变化规律找不到,需要进⾏js逆向
请求体,请求头参数完全⼀模⼀样,但是还是请求不到 js逆向带有时间戳
⽰例请求同⼀个数据,但是参数会改变
确认⽅案 对同⼀个接⼝请求两次,对⽐两次请求中不同的参数,就找到是那部分参数进⾏了加密

实战案例(一)

百度翻译:fanyi.baidu.com/?aldtype=16… (请求体的参数加密)

百度翻译js逆向.py

# coding = utf-8
import requests
import execjs

url = 'https://fanyi.baidu.com/v2transapi'

cookies = { 
    'APPGUIDE_10_6_9': '1',
    'BAIDUID': '0AC910D623E84E2AF2EC0FEB2844D6C4:FG=1',
    'BAIDUID_BFESS': '0AC910D623E84E2AF2EC0FEB2844D6C4:FG=1',
    'BIDUPSID': '92068D3C5A4CAF9FD74E85044409657D',
    'FANYI_WORD_SWITCH': '1',
    'HISTORY_SWITCH': '1',
    'H_PS_PSSID': '40124_40161_40200_40210_40207_40215_40223_40266_40079_40294_40290_40288_40285',
    'H_WISE_SIDS': '40124_40161_40200_40210_40207_40215_40223_40266_40079',
    'H_WISE_SIDS_BFESS': '40039_39939_40124_40161_40200_40210_40207_40215',
    'Hm_lpvt_64ecd82404c51e03dc91cb9e8c025574': '1709014343',
    'Hm_lvt_64ecd82404c51e03dc91cb9e8c025574': '1708245569,1708313786,1708586584,1709014343',
    'PSTM': '1699327572',
    'REALTIME_TRANS_SWITCH': '1',
    'SOUND_PREFER_SWITCH': '1',
    'SOUND_SPD_SWITCH': '1',
    'ZFY': 'GJJUPI80hEogV40Y:BfXkGhmboDFI5nAvm7Iu9p1Z2fQ:C',
    'ab_sr': '1.0.1_MDM0ZDAxMDAwMWUxMGFlMTNhNDc5NzhjYzJiMTBkYjU0ZTZhYTIyMjQyZDhmMTNjMWNkZTI3MTI1ODVkYzkxZjhmODYxZjIwNTRkMTlhZWM3ZmM3Njg2YTgzZmQzMGYzNWYzMTQzNGRlMDM4Nzk2YTg2OGU1N2NjNTdlNjI4NzNkZmI0OGRkNjI2MzkyYTRlNzAxNzQwMWYyMDU5YWRlNQ==',
}

headers = { 
    'Accept': '*/*',
    'Accept-Language': 'zh-CN,zh;q=0.9',
    'Acs-Token': '1708948806239_1709015253494_iiNu8giQN7SGz7NKJn2ymjISVlKelYfD8piAKrUU60RUrbGY3KJLa/HVG7Xs7tBhH6GBQyncEKnra4HsxMwpoWXU9b5oPQuapRf0UEiiX2IIbeaZ+3ial7JSpw6s/npoL9FPGW35BnjnOO7hmzEAeIDTHEj5Iw/ccWVH3tq+QRh0GDdVNTNHElCNpKJyI9/A7o5JxVDWNKAYTGvumidwAFp50FDobC2xS4BWuEfZ7bkmvs/UXt2aSZAzoTzlLVVmuDfEj6PdHsFGNnzocuVyFB4jbg4RF6j8V+18RmUmR8MXozpdq/dc2237nyNAx65WD6GZJOhJ34LffO6SKfqp0n+YiaP9Adnm1vUDH/CPXcRkCap9lQHjPxFi2PbSyZpKe5OCj0EpGP4XSQZn28UaGXLB1qImONBbLYmW880zWE6O6N8YWbMbtiHKqmhRCSyZrGtPHThsWLGFK5bJ4Q62ziRo5hlmsCobeV5tiOm3IYE=',
    'Connection': 'keep-alive',
    'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
    'Origin': 'https://fanyi.baidu.com',
    'Referer': 'https://fanyi.baidu.com/?aldtype=16047',
    'Sec-Fetch-Dest': 'empty',
    'Sec-Fetch-Mode': 'cors',
    'Sec-Fetch-Site': 'same-origin',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36',
    'X-Requested-With': 'XMLHttpRequest',
    'sec-ch-ua': '"Chromium";v="122", "Not(A:Brand";v="24", "Google Chrome";v="122"',
    'sec-ch-ua-mobile': '?0',
    'sec-ch-ua-platform': '"Windows"',
}

with open('bd.js','r',encoding='utf-8') as file:
    js=execjs.compile(file.read())    #execjs.compile对file.read()进行编译
query='你好'
data = {
    'from': 'zh',
    'to': 'en',
    'query': query,
    'transtype': 'realtime',
    'simple_means_flag': '3',
    'sign': js.call('b',query),  #调用js中的b函数,传入参数query
    'token': 'ea5952a149816a41548acc9f575cdc8f',
    'domain': 'common',
    'ts': '1709015253483',
}


# 当前时间戳: 1709016027.882114  'ts': '1709015253483',就是一个时间戳
response = requests.post(url, headers=headers, data=data, cookies=cookies)
print(response.json())

#1.查找请求体参数的来源,可以通过直接搜索方法
#sign 必须要全部名称全部相等,如果说数据比较多,在后面加上【:】,比如【sign:】


#通过一个个文件的排查,找到可能存在的位置,并且打上断点。断点:程序/代码运行到断点位置,就会断住,不会往之后运行

bd.js

function n(t, e) {
            for (var n = 0; n < e.length - 2; n += 3) {
                var r = e.charAt(n + 2);
                r = "a" <= r ? r.charCodeAt(0) - 87 : Number(r),
                r = "+" === e.charAt(n + 1) ? t >>> r : t << r,
                t = "+" === e.charAt(n) ? t + r & 4294967295 : t ^ r
            }
            return t
        }
// window是一个全局对象,环境 window={}
// ||是短路运算,默认返回前面的函数结果
function b(t) {
            var o, i = t.match(/[\uD800-\uDBFF][\uDC00-\uDFFF]/g);
            if (null === i) {
                var a = t.length;
                a > 30 && (t = "".concat(t.substr(0, 10)).concat(t.substr(Math.floor(a / 2) - 5, 10)).concat(t.substr(-10, 10)))
            } else {
                for (var s = t.split(/[\uD800-\uDBFF][\uDC00-\uDFFF]/), c = 0, u = s.length, l = []; c < u; c++)
                    "" !== s[c] && l.push.apply(l, function(t) {
                        if (Array.isArray(t))
                            return e(t)
                    }(o = s[c].split("")) || function(t) {
                        if ("undefined" != typeof Symbol && null != t[Symbol.iterator] || null != t["@@iterator"])
                            return Array.from(t)
                    }(o) || function(t, n) {
                        if (t) {
                            if ("string" == typeof t)
                                return e(t, n);
                            var r = Object.prototype.toString.call(t).slice(8, -1);
                            return "Object" === r && t.constructor && (r = t.constructor.name),
                            "Map" === r || "Set" === r ? Array.from(t) : "Arguments" === r || /^(?:Ui|I)nt(?:8|16|32)(?:Clamped)?Array$/.test(r) ? e(t, n) : void 0
                        }
                    }(o) || function() {
                        throw new TypeError("Invalid attempt to spread non-iterable instance.\nIn order to be iterable, non-array objects must have a [Symbol.iterator]() method.")
                    }()),
                    c !== u - 1 && l.push(i[c]);
                var p = l.length;
                p > 30 && (t = l.slice(0, 10).join("") + l.slice(Math.floor(p / 2) - 5, Math.floor(p / 2) + 5).join("") + l.slice(-10).join(""))
            }
            r = "320305.131321201"
            for (var d = "".concat(String.fromCharCode(103)).concat(String.fromCharCode(116)).concat(String.fromCharCode(107)), h = (null !== r ? r : (r = "320305.131321201" || "") || "").split("."), f = Number(h[0]) || 0, m = Number(h[1]) || 0, g = [], y = 0, v = 0; v < t.length; v++) {
                var _ = t.charCodeAt(v);
                _ < 128 ? g[y++] = _ : (_ < 2048 ? g[y++] = _ >> 6 | 192 : (55296 == (64512 & _) && v + 1 < t.length && 56320 == (64512 & t.charCodeAt(v + 1)) ? (_ = 65536 + ((1023 & _) << 10) + (1023 & t.charCodeAt(++v)),
                g[y++] = _ >> 18 | 240,
                g[y++] = _ >> 12 & 63 | 128) : g[y++] = _ >> 12 | 224,
                g[y++] = _ >> 6 & 63 | 128),
                g[y++] = 63 & _ | 128)
            }
            for (var b = f, w = "".concat(String.fromCharCode(43)).concat(String.fromCharCode(45)).concat(String.fromCharCode(97)) + "".concat(String.fromCharCode(94)).concat(String.fromCharCode(43)).concat(String.fromCharCode(54)), k = "".concat(String.fromCharCode(43)).concat(String.fromCharCode(45)).concat(String.fromCharCode(51)) + "".concat(String.fromCharCode(94)).concat(String.fromCharCode(43)).concat(String.fromCharCode(98)) + "".concat(String.fromCharCode(43)).concat(String.fromCharCode(45)).concat(String.fromCharCode(102)), x = 0; x < g.length; x++)
                b = n(b += g[x], w);
            return b = n(b, k),
            (b ^= m) < 0 && (b = 2147483648 + (2147483647 & b)),
            "".concat((b %= 1e6).toString(), ".").concat(b ^ f)
        }

// e="香蕉"
// sign = b(e)
// console.log(sign);

// "320305.131321201"
// "320305.131321201"

实战案例(二)

猫眼专业版:piaofang.maoyan.com/dashboard

猫眼专业版.py

# coding = utf-8
import crawles
import execjs



url = 'https://piaofang.maoyan.com/dashboard-ajax'

cookies = {
    '_lxsdk': '186a6f6b9d4c8-0ddef66ba7f5d9-26021051-1fa400-186a6f6b9d5c8',
    '_lxsdk_cuid': '186a6f6b9d4c8-0ddef66ba7f5d9-26021051-1fa400-186a6f6b9d5c8',
    '_lxsdk_s': '18de573e10a-e39-16-d4a%7C%7C1',
}

headers = {
    'Accept': 'application/json, text/plain, */*',
    'Accept-Language': 'zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6',
    'Cache-Control': 'no-cache',
    'Connection': 'keep-alive',
    'Pragma': 'no-cache',
    'Referer': 'https://piaofang.maoyan.com/dashboard',
    'Sec-Fetch-Dest': 'empty',
    'Sec-Fetch-Mode': 'cors',
    'Sec-Fetch-Site': 'same-origin',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36 Edg/113.0.1774.57',
    'X-FOR-WITH': 'Zj+YjTqKOkz6iFKWUbN/kEXb1kVS0Xgt0OuyPfp9hJTTT87oHnFXVe8//gWpSBOPUTJgp12SaXtR7UO0Iy/kQx+mx65JuPDvYNrT4eKltfZ4cyxnnqMJFmNDM49JozGSIvxNbhW+bnoO5wzBbVfanQ+zVMKVkd0ppM8DNG2busvYYPZFtByO+TQj6ovJYcMFnG9AMCf4Wm4pxpnrj0peWg==',
    'sec-ch-ua': '"Microsoft Edge";v="113", "Chromium";v="113", "Not-A.Brand";v="24"',
    'sec-ch-ua-mobile': '?0',
    'sec-ch-ua-platform': '"Windows"',
}

#其中的index = random.randint(0,1000),也可以换成这个
params = {
    'User-Agent': 'TW96aWxsYS81LjAgKFdpbmRvd3MgTlQgMTAuMDsgV2luNjQ7IHg2NCkgQXBwbGVXZWJLaXQvNTM3LjM2IChLSFRNTCwgbGlrZSBHZWNrbykgQ2hyb21lLzExMy4wLjAuMCBTYWZhcmkvNTM3LjM2IEVkZy8xMTMuMC4xNzc0LjU3',
    'channelId': '40009',
    'index': '991',
    'orderType': '0',
    'sVersion': '2',
    'timeStamp': '1708951665245',
    'uuid': '186a6f6b9d4c8-0ddef66ba7f5d9-26021051-1fa400-186a6f6b9d5c8',
}

with open('my.js', 'r', encoding='utf-8') as file:
    js = execjs.compile(file.read())
params['signKey'] = js.call('b', params['index'], params['timeStamp'])

# 'signKey': '686651c99b3b916e6dbde5b600be3a15',
# index
# signKey "9f8a780184ebc39c6f7fb82e0e24ef76"

# 加密的数据大部分是放置在一起的


# 当前时间戳: 1708951709.0289228
response = crawles.get(url, headers=headers, params=params, cookies=cookies)
# print(response.json())

for data in response.json()['movieList']['data']['list']:
    print(f"影片名称{data['movieInfo']['movieName']}"
          f" 综合票房{data['boxSplitUnit']['num']}"
          f" 票房占比{data['boxRate']}"
          f" 排片场次{data['showCount']}")

my.js

// _jsMd2.default
// c
// i(269)


// 引⽤CryptoJS
CryptoJS = require("crypto-js");
// 下载crypto-js模块,npm install crypto-js
// 一般进行加/解密都需要用到crypto-js,需要注意的是:命令需要运行在当前项目环境

function b(index,timeStamp){
    c=`method=GET&timeStamp=${timeStamp}&User-Agent=TW96aWxsYS81LjAgKFdpbmRvd3MgTlQgMTAuMDsgV2luNjQ7IHg2NCkgQXBwbGVXZWJLaXQvNTM3LjM2IChLSFRNTCwgbGlrZSBHZWNrbykgQ2hyb21lLzEyMi4wLjAuMCBTYWZhcmkvNTM3LjM2&index=${index}&channelId=40009&sVersion=2&key=A013F70DB97834C0A5492378BD76C53A`
    f = (0,CryptoJS.MD5)(c['replace'](/\s+/g, " "))  //  /g在js中是全局的意思,\s是一个空格,\s+是多个空格
    signKey = f
    return signKey.toString()
}
//c="method=GET&timeStamp=1709022588516&User-Agent=TW96aWxsYS81LjAgKFdpbmRvd3MgTlQgMTAuMDsgV2luNjQ7IHg2NCkgQXBwbGVXZWJLaXQvNTM3LjM2IChLSFRNTCwgbGlrZSBHZWNrbykgQ2hyb21lLzEyMi4wLjAuMCBTYWZhcmkvNTM3LjM2&index=339&channelId=40009&sVersion=2&key=A013F70DB97834C0A5492378BD76C53A"
// f = (0,CryptoJS.MD5)(c['replace'](/\s+/g, " "))  //  /g在js中是全局的意思,\s是一个空格,\s+是多个空格
// (0,_jsMd2.default)这个语法看起来有些奇怪,它实际上是使⽤了逗号表达式。逗号表达式会执⾏它的每⼀个⼦表达式(从左⾄右),并返回最后⼀个⼦表达式的值。

// signKey = f
//
// console.log(signKey.toString());  //获取md5加密后的数据


//混淆,在javascript所有的.操作都可以用字典的中括号来完成
// s='abc'
// console.log(s.replace('a', 'b'));
// console.log(s['replace']('a', 'b')); //这里的'replace'一般就可以进行加密

总结:

  1. 翻页或者再次获取数据,请求体的参数会发生变化
  2. 暴力搜索,找到满足(名称完全相同)要求的数据
  3. 把满足要求的打上断点
  4. 进行调试(刷新,放入新的数据)
  5. 如果断点了,找到断点位置,查看数据是否是我们需要的数据
  6. 抠js代码,通过把代码写死,目的是为了让js代码能够运行
  7. []和周围是否有类似代码,确认是否混淆,手动解开混淆
  8. 确认调试后没有问题,接入python代码中