某音搜索结果批量采集,关键词抓取!

839 阅读1分钟

抖音搜索结果批量采集,关键词抓取!

一、背景介绍:

1.1 爬取目标

您好!本文原创自 ,鹿邑网爬 的个人主页 - 动态 - 掘金 (juejin.cn)](juejin.cn/user/105161…

我用python开发了一个爬虫采集软件,可自动按关键词抓取抖音视频数据。

爬取运行截图: 1.png

爬取结果截图:

2.png

二、代码讲解:

2.1 爬虫采集模块:

首先,定义接口地址作为请求地址:

url = 'https://www.douyin.com/aweme/v1/web/general/search/single/'

定义一个请求头,用于伪造浏览器:

headers = {
    "Cookie": "替换自己的cookie值"

    "referer": "https://www.douyin.com/search/%E6%97%85%E8%A1%8C?aid=e1302513-ab7d-4574-bacd-46104c5576c2&type=general",
    "User-Agent": "user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36",
}

说明一下,cookie是个关键参数。 cookie的获取方法,如下:

3.png

加上请求参数,告诉程序你的爬取条件是什么:

params = {
            "device_platform": "webapp",
            "aid": "6383",
            "channel": "channel_pc_web",
            "search_channel": "aweme_general",
            "enable_history": "1",
            "keyword": search,
            "search_source": "normal_search",
            "query_correct_type": "1",
            "is_filter_search": "0",
            "offset": str(OF),
            "count": "15",
            "need_filter_settings": "1",
            "update_version_code": "170400",
            "pc_client_type": "1",
            "version_code": "190600",
            "version_name": "19.6.0",
            "cookie_enabled": "true",
            "screen_width": "1536",
            "screen_height": "864",
            "browser_language": "zh-CN",
            "browser_platform": "Win32",
            "browser_name": "Chrome",
            "browser_version": "124.0.0.0",
            "browser_online": "true",
            "engine_name": "Blink",
            "engine_version": "124.0.0.0",
            "os_name": "Windows",
            "os_version": "10",
            "cpu_core_num": "16",
            "device_memory": "8",
            "platform": "PC",
            "downlink": "10",
            "effective_type": "4g",
            "round_trip_time": "50",
            "webid": "7368840539326547483",

        }
        

完整代码中,还含有:判断循环结束条件、点赞数量,等关键实现逻辑。三、获取源码及软件

三、获取源码

完整python源码后台私信备注来意。