万福迎萌虎,万虎庆新春,2022虎年大吉!

1,821 阅读18分钟

PK创意闹新春,我正在参加「春节创意投稿大赛」,详情请看:春节创意投稿大赛

前言

2022 虎年大吉,共迎新春,新的一年,我希望通过”万虎“拼福字以及”万福“拼萌虎的方式,给大家带来好运,祝大家虎虎生威,顺风顺水,人生如意,财源滚滚!恭喜完了,下面就要亮技术了~

一、爬取福虎

通过 Python 爬虫技术,我们可以很方便快捷的从百度图片爬取到大量虎和福的图片,下面演示一下如何爬取:

1、网站分析

首先,我们打开百度图片,F12 打开控制台,然后搜索一个 ”虎年“,点击图片:

在滑动鼠标加载更多图片的同时,我们查看控制台中输出的内容,可以发现有很多数据包:

这里随便选择一条,并复制这个数据包的 URL 请求:

https://image.baidu.com/search/acjson?tn=resultjson_com&logid=11365856357363949053&ipn=rj&ct=201326592&is=&fp=result&fr=&word=%E8%99%8E%E5%B9%B4&queryWord=%E8%99%8E%E5%B9%B4&cl=2&lm=-1&ie=utf-8&oe=utf-8&adpicid=&st=-1&z=&ic=0&hd=&latest=&copyright=&s=&se=&tab=&width=&height=&face=0&istype=2&qc=&nc=1&expermode=&nojc=&isAsync=&pn=120&rn=30&gsm=78&1642056713314=

点开这个 URL 看到其携带的参数,复制这一段:

tn: resultjson_com
logid: 11365856357363949053
ipn: rj
ct: 201326592
is: 
fp: result
fr: 
word: 虎年
queryWord: 虎年
cl: 2
lm: -1
ie: utf-8
oe: utf-8
adpicid: 
st: -1
z: 
ic: 0
hd: 
latest: 
copyright: 
s: 
se: 
tab: 
width: 
height: 
face: 0
istype: 2
qc: 
nc: 1
expermode: 
nojc: 
isAsync: 
pn: 120
rn: 30
gsm: 78
1642056713314: 

至此,我们已经获取到所需的代码,分析结束!

2、爬取代码

关于代码编写部分的详细步骤我就不做描述了,以下分享下主要源码:

import requests
import os
from lxml import etree
path = r"/Users/lpc/Downloads/baidu1/"
# 判断目录是否存在,存在则跳过,不存在则创建
if os.path.exists(path):
    pass
else:
    os.mkdir(path)

page = input('请输入要爬取多少页:')
page = int(page) + 1
header = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 11_1_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36'
}
n = 0
pn = 1
# pn是从第几张图片获取 百度图片下滑时默认一次性显示30张
for m in range(1, page):
    url = 'https://image.baidu.com/search/acjson?'

    param = {
        'tn': 'resultjson_com',
        'logid': '7680290037940858296',
        'ipn': 'rj',
        'ct': '201326592',
        'is': '',
        'fp': 'result',
        'queryWord': '虎年',
        'cl': '2',
        'lm': '-1',
        'ie': 'utf-8',
        'oe': 'utf-8',
        'adpicid': '',
        'st': '-1',
        'z': '',
        'ic': '0',
        'hd': '1',
        'latest': '',
        'copyright': '',
        'word': '虎年',
        's': '',
        'se': '',
        'tab': '',
        'width': '',
        'height': '',
        'face': '0',
        'istype': '2',
        'qc': '',
        'nc': '1',
        'fr': '',
        'expermode': '',
        'nojc': '',
        'acjsonfr': 'click',
        'pn': pn,  # 从第几张图片开始
        'rn': '30',
        'gsm': '3c',
        '1635752428843=': '',
    }
    page_text = requests.get(url=url, headers=header, params=param)
    page_text.encoding = 'utf-8'
    page_text = page_text.json()
    print(page_text)
    # 先取出所有链接所在的字典,并将其存储在一个列表当中
    info_list = page_text['data']
    # 由于利用此方式取出的字典最后一个为空,所以删除列表中最后一个元素
    del info_list[-1]
    # 定义一个存储图片地址的列表
    img_path_list = []
    for i in info_list:
        img_path_list.append(i['thumbURL'])
    # 再将所有的图片地址取出,进行下载
    # n将作为图片的名字
    for img_path in img_path_list:
        img_data = requests.get(url=img_path, headers=header).content
        img_path = path + str(n) + '.jpg'
        with open(img_path, 'wb') as fp:
            fp.write(img_data)
        n = n + 1

    pn += 29

以上方式可以实现爬取百度图片,但是每次都要去分析爬取,比较麻烦且不智能,因此再分享一个运行后只需要输入关键词 “虎年” 即可爬取的源码:

# -*- coding:utf-8 -*-
import requests
import re, time, datetime
import os
import random
import urllib.parse
from PIL import Image  # 导入一个模块

imgDir = r"/Volumes/DBA/python/img/"
# 设置headers 为了防止反扒,设置多个headers
# chrome,firefox,Edge
headers = [
    {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36',
        'Accept-Language': 'zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2',
        'Connection': 'keep-alive'
    },
    {
        "User-Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:79.0) Gecko/20100101 Firefox/79.0',
        'Accept-Language': 'zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2',
        'Connection': 'keep-alive'
    },
    {
        "User-Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19041',
        'Accept-Language': 'zh-CN',
        'Connection': 'keep-alive'
    }
]

picList = []  # 存储图片的空 List

keyword = input("请输入搜索的关键词:")
kw = urllib.parse.quote(keyword)  # 转码


# 获取 1000 张百度搜索出来的缩略图 list
def getPicList(kw, n):
    global picList
    weburl = r"https://image.baidu.com/search/acjson?tn=resultjson_com&logid=11601692320226504094&ipn=rj&ct=201326592&is=&fp=result&queryWord={kw}&cl=2&lm=-1&ie=utf-8&oe=utf-8&adpicid=&st=&z=&ic=&hd=&latest=&copyright=&word={kw}&s=&se=&tab=&width=&height=&face=&istype=&qc=&nc=1&fr=&expermode=&force=&cg=girl&pn={n}&rn=30&gsm=1e&1611751343367=".format(
        kw=kw, n=n * 30)
    req = requests.get(url=weburl, headers=random.choice(headers))
    req.encoding = req.apparent_encoding  # 防止中文乱码
    webJSON = req.text
    imgurlReg = '"thumbURL":"(.*?)"'  # 正则
    picList = picList + re.findall(imgurlReg, webJSON, re.DOTALL | re.I)


for i in range(150):  # 循环数比较大,如果实际上没有这么多图,那么 picList 数据不会增加。
    getPicList(kw, i)

for item in picList:
    # 后缀名 和名字
    itemList = item.split(".")
    hz = ".jpg"
    picName = str(int(time.time() * 1000))  # 毫秒级时间戳
    # 请求图片
    imgReq = requests.get(url=item, headers=random.choice(headers))
    # 保存图片
    with open(imgDir + picName + hz, "wb") as f:
        f.write(imgReq.content)
    #  用 Image 模块打开图片
    im = Image.open(imgDir + picName + hz)
    bili = im.width / im.height  # 获取宽高比例,根据宽高比例调整图片大小
    newIm = None
    # 调整图片的大小,最小的一边设置为 50
    if bili >= 1:
        newIm = im.resize((round(bili * 50), 50))
    else:
        newIm = im.resize((50, round(50 * im.height / im.width)))
    # 截取图片中 50*50 的部分
    clip = newIm.crop((0, 0, 50, 50))  # 截取图片,crop 裁切
    clip.convert("RGB").save(imgDir + picName + hz)  # 保存截取的图片
    print(picName + hz + " 处理完毕")

演示一下此种方式,运行后输入 “虎年” 回车等待下载完成即可:

以上就是爬取百度图片的源码,分别爬取 “福” 和 “虎” 之后,我们就可以开始万图成像啦!

二、万图成像

万图成像很简单,之前我也写过一篇类似的文章有介绍过:Python 批量爬取猫咪图片实现千图成像 ,可以参考一下!

效果图

下面直接上传我们拼好的效果图:

三、我用SQL写福字

SQL如下:

create table LuciferFu(fu_line varchar2(128));
insert into LuciferFu values
('20222022202220222022202220222022202220222022202220222022202220222022202220222022202220222022202220222022202220222022'),
('20222022202220222022202220222022202220222022202220222022202220222022202220222022202220222022202220222022202220222022'),
('20222022202220222022202220222022202220222022202220222022202220222022202220222022202220222022202220222022202220222022'),
('20222022202220222022202220222022202220222022202220222022202220222022202220222022202220222022202220222022202220222022'),
('2022202220222022202220222022202220222022202220222022202220222022202220222/[[    .\2022202220222022202220222022202222'),
('2022202220222022202220222022202220222022202220222022202220222022222/[.              ,2022202220222022202220222022222'),
('202220222022202220222022/2[202220222022202220222022202220222022`                      \20222022202220222022202220222'),
('202220222022202220222022^                ,\2022202220222022[`                          =2022202220222022202220222022'),
('2022202220222022202220222                   \20222022/2[                               \2022202220222022202220222022'),
('2022202220222022202220222                     20222\                                  ]20222022202220222022202220222'),
('2022202220222022202220222^                     \2022                             .]/20222022202220222022202220222022'),
('20222022202220222022202222                      2022`                 .,,]]/2022202220222022202220222022202220222022'),
('202220222022202220222022222                     \2022]  ,/2/,/202220222022202220222022202220222022202220222022202222'),
('2022202220222022202220222022                    .20222022`/202220222022202220222022202220222022202220222022202220222'),
('20222022202220222022202220222.                  ,\2022\`/202220222022202222[.       ..[20222022202220222022202220222'),
('2022202220222022202220222022`                   =222^  2022202220222[`                 ,2022202220222022202220222022'),
('20222022202220222022202222^   ]2\]`              =2`   2022222/`                        2022202220222022202220222022'),
('202220222022202220222022/   /20222022\`      ,]222`    ,2[`                            =2022202220222022202220222022'),
('2022202220222022202222/   /2022202220222022202222^                                 ,20222022202220222022202220222022'),
('202220222022202220222^ ,/2022222[`   ,[[[\2022222.                .]/22^          /202220222022202220222022202220222'),
('2022202220222022222/ ,/202222/`            ,20222          ,]]]202222`           =2022202220222022202220222022202222'),
('202220222022202222`,2022222/[               /222^        ,20222022/`             [2022202220222022202220222022202222'),
('2022202220222022^ /20222022^               /222/          202222[                   ,2022202220222022202220222022222'),
('20222022202222/ ,202220222`                 22^           ,22/                      /2022202220222022202220222022222'),
('2022202220222/./2022222`                  ,22^                                    ]/20222022202220222022202220222022'),
('202220222022./202222[                    /22`    =^                            ,202220222022202220222022202220222022'),
('2022202222/,20222[                      ,,/     .2^                     ,]/20222022202220222022202220222022202220222'),
('20222022/./22[.                          `      22^              .]20222022202220222022/[[[[[\2022202220222022202222'),
('20222/`                                        222^           2022202220222022[[.                [\20222022202220222'),
('20222                    /2                   20222.          =20222022[`                           ,202220222022222'),
('2022^                  /22/                  =20222\           =2[`                                   ,2022202220222'),
('2022\              ,]20222                   2022222`                                                  .202220222022'),
('202222\]]]]`]]\/202220222`                 ,202220222                     ]/222\/2022\2/]`              ,20222022222'),
('202220222022202220222022`                 /[20222022/                     2022202220222022\             /20222022222'),
('20222022202220222022222^               ,`    ,20222^           .         =2/`,`[,[[\202222\             \20222022222'),
('2022202220222022202222^              ,/2^      ,222].     .]]222^        .             ,222             202220222022'),
('202220222022202220222/.             /222^       ,2022202220222022                       =2^            .202220222022'),
('202220222022202220222.              20222.       =202220222022/`                       =22.            ,202220222022'),
('20222022202220222022^             .202222.       .20222022/[                         ,`=22             =202220222022'),
('2022202220222022222^..  .. .      .\20222^        ,2022[.                           ]2022^.           .2022202220222'),
('20222022202220222/.....          ..2022222`. .   . ,^          .            .   ]\/202222.            =2022202220222'),
('20222022202220222........... . ....20222022... . .. ,` .. .    ... .. .    .. .202220222^       ...  .20222022202222'),
('2022202220222022`................ .\2022222^ ... ............,/22^...... ../202220222022.............=20222022202222'),
('20222022202222022`..................,\2022222`.........,\]]20222`........../202220222022^.............202220222022222'),
('2022202220222022^...................202220222..........=2022/.,`.....................\2.............=202220222022222'),
('202220222022202222.]20222..........,2022202222..........2/]222......................................2022202220222022'),
('2022202220222022202220222..........,2022202222^..........`.........................................,2022202220222022'),
('2022202220222022202220222^.........=20222022222^...................................................20222022202220222'),
('2022202220222022202220222^.........=202220222022\.................................................202220222022202222'),
('2022202220222022202220222^.*........=202220222022\...................,]]]/20222\...............,20222022202220222022'),
('2022202220222022202220222^...*....../2022202220222\.......]/20222022202220222022\...*..*.....,2022202220222022202222'),
('20222022202220222022202222......**]/2022202220222022\**,]2022202220222022202220222`.......,2022202220222022202220222'),
('20222022202220222022202222\/2\20222022202220222022202220222022202220222022202220222022202220222022202220222022202222'),
('20222022202220222022202220222022202220222022202220222022202220222022202220222022202220222022202220222022202220222022'),
('20222022202220222022202220222022202220222022202220222022202220222022202220222022202220222022202220222022202220222022'),
('20222022202220222022202220222022202220222022202220222022202220222022202220222022202220222022202220222022202220222022'),
('20222022202220222022202220222022202220222022202220222022202220222022202220222022202220222022202220222022202220222022');

效果图:

select * from LuciferFu;

感兴趣的朋友可以自己试试,玩转虎年!