爬虫实例2 用bs4库批量爬取图片import urllib.'User-Agent':'Mozilla/5.0 (

还是老规矩先上代码，在具体有分析

import os
import requests
from bs4 import BeautifulSoup
import urllib.request
'''
基于函数去编写爬虫程序
    浏览器发送请求 request
    服务器响应    response
'''
#浏览器伪装
headers={
        'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:82.0) Gecko/20100101 Firefox/82.0'
}
if not os.path.exists('./picture/'):
    os.mkdir('./picture/')#自动在同一路径下创建一个picture文件夹，有就创建，没有就不创建

def get_images(url):
    images_html=requests.get(url,headers=headers).text
    #lxml:html解析库把python代码转换成python对象
    soup=BeautifulSoup(images_html,'lxml')
    images_list = soup.find_all('div',class_='mbpho')#找到img标签
    for image in images_list:
        image_data=image.find('a',class_='a')
        url=image_data.find('img')['src']
        image_id=image_data.find('img')['data-rootid']
        print(url,image_id)
        #for循环这里就是不断传参数的过程，直到找到图片的链接地址
        #抛出异常
        try:
            urllib.request.urlretrieve(url,'/picture/'+image_id+os.path.splitext(url)[-1])
            print("下载成功....")
        except Exception as e:
            print(e)

for i in range(int()):
            # 1.获取网页
            print('正在获取第{}页'.format(i+100))
url='https://www.duitang.com/search/?kw=%E5%8F%A4%E9%A3%8E&type=feed'
get_images(url)#调用函数

BS4简介

BS4全称是Beatiful Soup，它提供一些简单的、python式的函数用来处理导航、搜索、修改分析树等功能。它是一个工具箱，通过解析文档为tiful Soup自动将输入文档转换为Unicode编码，输出文档转换为utf-8编码。你不需要考虑编码方式，除非文档没有指定一个编一下原始编码方式就可以了

分部解析：

一：导包部分

import os #os.path子库以path为入口，用于操作和处理文件路径；这里的path指的是
          #目录或者包含文件名称的文件路径
import requests#服务器请求资源的request对象，以及Response请求，即包含从服务器返回的所有资源
from bs4 import BeautifulSoup
import urllib.request #python用来模拟http发送请求消息的库

二：浏览器伪装部分：

headers={
        'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:82.0) Gecko/20100101 Firefox/82.0'
}

User-Agent来源：F12开发者工具在这里插入图片描述
最后一个就是User-Agent。这里书写的时候一定要注意，要嘛全是单引号，要嘛全用双引号，且键值对之间不能有空格，不然就会报错。

三：创建文件夹，这个文件夹会和.py文件在同一目录下，有就不会创建，没有就自动创建。

if not os.path.exists('./picture/'):
    os.mkdir('./picture/')

四：获取图片具体地址：

def get_images(url):
    images_html=requests.get(url,headers=headers).text
    #lxml:html解析库把python代码转换成python对象
    soup=BeautifulSoup(images_html,'lxml')
    images_list = soup.find_all('div',class_='mbpho')#找到img标签
    for image in images_list:
        image_data=image.find('a',class_='a')
        url=image_data.find('img')['src']
        image_id=image_data.find('img')['data-rootid']
        print(url,image_id)
        #for循环这里就是不断传参数的过程，直到找到图片的链接地址
        #抛出异常
        try:
            urllib.request.urlretrieve(url,'/picture/'+image_id+os.path.splitext(url)[-1])
            print("下载成功....")
        except Exception as e:
            print(e)

这里其实就是一步一步去定位图片的链接的过程

最后分享一下自己的学习心得，其实爬虫就是或者是写一段程序其实是一步一调试的过程，新手容易出的问题就是照着敲完再运行，这样你错在哪里都不知道。

新手博主，请前辈批评指正——@丁一