网络刮削是一种常见的技术,用于从互联网上获取不同类型的应用数据。由于网上有几乎无限的数据,软件开发商创造了许多工具,使之能够有效地汇编信息。在网络搜刮的过程中,一个计算机程序向互联网上的一个网站发送请求。作为对程序请求的回应,一个Html文档被发送回来。该文件中包含了你可能出于某种目的而感兴趣的信息。为了快速访问这些数据,解析的步骤开始发挥作用。通过解析文档,我们可以隔离并集中于我们感兴趣的特定数据点。帮助使用这种技术的常见Python库是Beautiful Soup、[lxml]和[Requests]。在本教程中,我们将把这些工具用于学习如何使用Python实现Web Scraping。
安装Web Scraping代码
从终端运行这三个命令,就可以跟上进度。也建议使用[虚拟环境]来保持系统的清洁。
- pip install lxml
- pip install requests
- pip install beautifulsoup4
找到一个要搜刮的网站
为了学习如何进行网络抓取,我们可以测试一个名为quotes.toscrape.com/ 的网站,它看起来就是为了这个目的而做的。

从这个网站上,也许我们想创建一个数据存储,包括所有的作者、标签和页面上的引言。如何才能做到这一点呢?好吧,首先我们可以看一下页面的来源。这是在向网站发送请求时实际返回的数据。因此,在火狐浏览器中,我们可以右键单击该页面,并选择 "查看页面源"。

这将显示该页面的原始Html标记。这里显示了它,供参考。
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Quotes to Scrape</title>
<link rel="stylesheet" href="/static/bootstrap.min.css">
<link rel="stylesheet" href="/static/main.css">
</head>
<body>
<div class="container">
<div class="row header-box">
<div class="col-md-8">
<h1>
<a href="/" style="text-decoration: none">Quotes to Scrape</a>
</h1>
</div>
<div class="col-md-4">
<p>
<a href="/login">Login</a>
</p>
</div>
</div>
<div class="row">
<div class="col-md-8">
<div class="quote" itemscope itemtype="http://schema.org/CreativeWork">
<span class="text" itemprop="text">“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”</span>
<span>by <small class="author" itemprop="author">Albert Einstein</small>
<a href="/author/Albert-Einstein">(about)</a>
</span>
<div class="tags">
Tags:
<meta class="keywords" itemprop="keywords" content="change,deep-thoughts,thinking,world" / >
<a class="tag" href="/tag/change/page/1/">change</a>
<a class="tag" href="/tag/deep-thoughts/page/1/">deep-thoughts</a>
<a class="tag" href="/tag/thinking/page/1/">thinking</a>
<a class="tag" href="/tag/world/page/1/">world</a>
</div>
</div>
<div class="quote" itemscope itemtype="http://schema.org/CreativeWork">
<span class="text" itemprop="text">“It is our choices, Harry, that show what we truly are, far more than our abilities.”</span>
<span>by <small class="author" itemprop="author">J.K. Rowling</small>
<a href="/author/J-K-Rowling">(about)</a>
</span>
<div class="tags">
Tags:
<meta class="keywords" itemprop="keywords" content="abilities,choices" / >
<a class="tag" href="/tag/abilities/page/1/">abilities</a>
<a class="tag" href="/tag/choices/page/1/">choices</a>
</div>
</div>
<div class="quote" itemscope itemtype="http://schema.org/CreativeWork">
<span class="text" itemprop="text">“There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”</span>
<span>by <small class="author" itemprop="author">Albert Einstein</small>
<a href="/author/Albert-Einstein">(about)</a>
</span>
<div class="tags">
Tags:
<meta class="keywords" itemprop="keywords" content="inspirational,life,live,miracle,miracles" / >
<a class="tag" href="/tag/inspirational/page/1/">inspirational</a>
<a class="tag" href="/tag/life/page/1/">life</a>
<a class="tag" href="/tag/live/page/1/">live</a>
<a class="tag" href="/tag/miracle/page/1/">miracle</a>
<a class="tag" href="/tag/miracles/page/1/">miracles</a>
</div>
</div>
<div class="quote" itemscope itemtype="http://schema.org/CreativeWork">
<span class="text" itemprop="text">“The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.”</span>
<span>by <small class="author" itemprop="author">Jane Austen</small>
<a href="/author/Jane-Austen">(about)</a>
</span>
<div class="tags">
Tags:
<meta class="keywords" itemprop="keywords" content="aliteracy,books,classic,humor" / >
<a class="tag" href="/tag/aliteracy/page/1/">aliteracy</a>
<a class="tag" href="/tag/books/page/1/">books</a>
<a class="tag" href="/tag/classic/page/1/">classic</a>
<a class="tag" href="/tag/humor/page/1/">humor</a>
</div>
</div>
<div class="quote" itemscope itemtype="http://schema.org/CreativeWork">
<span class="text" itemprop="text">“Imperfection is beauty, madness is genius and it's better to be absolutely ridiculous than absolutely boring.”</span>
<span>by <small class="author" itemprop="author">Marilyn Monroe</small>
<a href="/author/Marilyn-Monroe">(about)</a>
</span>
<div class="tags">
Tags:
<meta class="keywords" itemprop="keywords" content="be-yourself,inspirational" / >
<a class="tag" href="/tag/be-yourself/page/1/">be-yourself</a>
<a class="tag" href="/tag/inspirational/page/1/">inspirational</a>
</div>
</div>
<div class="quote" itemscope itemtype="http://schema.org/CreativeWork">
<span class="text" itemprop="text">“Try not to become a man of success. Rather become a man of value.”</span>
<span>by <small class="author" itemprop="author">Albert Einstein</small>
<a href="/author/Albert-Einstein">(about)</a>
</span>
<div class="tags">
Tags:
<meta class="keywords" itemprop="keywords" content="adulthood,success,value" / >
<a class="tag" href="/tag/adulthood/page/1/">adulthood</a>
<a class="tag" href="/tag/success/page/1/">success</a>
<a class="tag" href="/tag/value/page/1/">value</a>
</div>
</div>
<div class="quote" itemscope itemtype="http://schema.org/CreativeWork">
<span class="text" itemprop="text">“It is better to be hated for what you are than to be loved for what you are not.”</span>
<span>by <small class="author" itemprop="author">André Gide</small>
<a href="/author/Andre-Gide">(about)</a>
</span>
<div class="tags">
Tags:
<meta class="keywords" itemprop="keywords" content="life,love" / >
<a class="tag" href="/tag/life/page/1/">life</a>
<a class="tag" href="/tag/love/page/1/">love</a>
</div>
</div>
<div class="quote" itemscope itemtype="http://schema.org/CreativeWork">
<span class="text" itemprop="text">“I have not failed. I've just found 10,000 ways that won't work.”</span>
<span>by <small class="author" itemprop="author">Thomas A. Edison</small>
<a href="/author/Thomas-A-Edison">(about)</a>
</span>
<div class="tags">
Tags:
<meta class="keywords" itemprop="keywords" content="edison,failure,inspirational,paraphrased" / >
<a class="tag" href="/tag/edison/page/1/">edison</a>
<a class="tag" href="/tag/failure/page/1/">failure</a>
<a class="tag" href="/tag/inspirational/page/1/">inspirational</a>
<a class="tag" href="/tag/paraphrased/page/1/">paraphrased</a>
</div>
</div>
<div class="quote" itemscope itemtype="http://schema.org/CreativeWork">
<span class="text" itemprop="text">“A woman is like a tea bag; you never know how strong it is until it's in hot water.”</span>
<span>by <small class="author" itemprop="author">Eleanor Roosevelt</small>
<a href="/author/Eleanor-Roosevelt">(about)</a>
</span>
<div class="tags">
Tags:
<meta class="keywords" itemprop="keywords" content="misattributed-eleanor-roosevelt" / >
<a class="tag" href="/tag/misattributed-eleanor-roosevelt/page/1/">misattributed-eleanor-roosevelt</a>
</div>
</div>
<div class="quote" itemscope itemtype="http://schema.org/CreativeWork">
<span class="text" itemprop="text">“A day without sunshine is like, you know, night.”</span>
<span>by <small class="author" itemprop="author">Steve Martin</small>
<a href="/author/Steve-Martin">(about)</a>
</span>
<div class="tags">
Tags:
<meta class="keywords" itemprop="keywords" content="humor,obvious,simile" / >
<a class="tag" href="/tag/humor/page/1/">humor</a>
<a class="tag" href="/tag/obvious/page/1/">obvious</a>
<a class="tag" href="/tag/simile/page/1/">simile</a>
</div>
</div>
<nav>
<ul class="pager">
<li class="next">
<a href="/page/2/">Next <span aria-hidden="true">→</span></a>
</li>
</ul>
</nav>
</div>
<div class="col-md-4 tags-box">
<h2>Top Ten tags</h2>
<span class="tag-item">
<a class="tag" style="font-size: 28px" href="/tag/love/">love</a>
</span>
<span class="tag-item">
<a class="tag" style="font-size: 26px" href="/tag/inspirational/">inspirational</a>
</span>
<span class="tag-item">
<a class="tag" style="font-size: 26px" href="/tag/life/">life</a>
</span>
<span class="tag-item">
<a class="tag" style="font-size: 24px" href="/tag/humor/">humor</a>
</span>
<span class="tag-item">
<a class="tag" style="font-size: 22px" href="/tag/books/">books</a>
</span>
<span class="tag-item">
<a class="tag" style="font-size: 14px" href="/tag/reading/">reading</a>
</span>
<span class="tag-item">
<a class="tag" style="font-size: 10px" href="/tag/friendship/">friendship</a>
</span>
<span class="tag-item">
<a class="tag" style="font-size: 8px" href="/tag/friends/">friends</a>
</span>
<span class="tag-item">
<a class="tag" style="font-size: 8px" href="/tag/truth/">truth</a>
</span>
<span class="tag-item">
<a class="tag" style="font-size: 6px" href="/tag/simile/">simile</a>
</span>
</div>
</div>
</div>
<footer class="footer">
<div class="container">
<p class="text-muted">
Quotes by: <a href="https://www.goodreads.com/quotes">GoodReads.com</a>
</p>
<p class="copyright">
Made with <span class='sh-red'>❤</span> by <a href="https://scrapinghub.com">Scrapinghub</a>
</p>
</div>
</footer>
</body>
</html>
正如你从上面的标记中看到的,有很多数据看起来都是混在一起的。网络搜刮的目的是为了能够访问网页中我们感兴趣的部分。许多软件开发者会采用[正则表达式]来完成这项任务,这绝对是一个可行的选择。Python的Beautiful Soup库是一种更方便用户的方式,可以提取我们想要的信息。
构建搜刮脚本
在PyCharm中,我们可以添加一个新的文件,该文件将容纳Python代码,以搜刮我们的页面。

scraper.py
import requests
from bs4 import BeautifulSoup
url = 'http://quotes.toscrape.com/'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'lxml')
print(soup)
上面的代码是我们的Python刮削脚本的开始。在文件的顶部,首先要做的是导入request和BeautifulSoup库。然后,我们把我们要搜刮的URL直接设置到那个 **url变量中。然后将其传递给requests.get()**函数,并将结果分配到 **response变量中。我们使用BeautifulSoup()**构造函数,将响应文本放入 soup变量中,设置lxml为格式。最后,我们打印出 **soup**变量,你应该看到与下面的屏幕截图类似的东西。从本质上讲,软件正在访问网站,读取数据并查看网站的来源,就像我们上面手动做的那样。唯一不同的是,这次我们所要做的就是点击一个按钮来查看输出。相当整洁!

遍历HTML结构
HTML是超文本标记语言的缩写,它的工作原理是用特定的标签来分配HTML文档中的元素。HTML有许多不同的标签,但一般的布局涉及三个基本标签。一个HTML标签,一个head标签,和一个body标签。这些标签组织了HTML文档。在我们的案例中,我们将主要关注body标签中的信息。在这一点上,我们的脚本能够从我们指定的Url中获取Html标记。下一步是关注我们感兴趣的具体数据。请注意,如果你在浏览器中使用检查器工具,就可以很容易地看到到底是什么Html标记负责渲染页面上的某条信息。当我们把鼠标指针悬停在一个特定的跨度标签上时,我们可以看到相关的文本在浏览器窗口中被自动高亮显示。事实证明,每一个引号都在一个span标签里面,而span标签也有一类文本。这就是你破译如何搜刮数据的方法。你在页面上寻找模式,然后创建对该模式起作用的代码。试着玩一玩,注意到无论你把鼠标指针放在哪里,这都是有效的。我们可以看到一个特定的报价与特定的Html标记之间的映射。网络刮削使得我们可以很容易地获取一个Html文档的所有类似部分。这就是我们需要了解的所有HTML知识,以刮取简单的网站。

解析Html标记
Html 文档中有很多信息,但 Beautiful Soup 让我们非常容易找到我们想要的数据,有时只需一行代码。所以让我们继续搜索所有有文本类的 span 标签。这应该可以为我们找到所有的引号。当你想在页面上找到多个相同的标签时,可以使用**find_all()**函数。
scraper.py
import requests
from bs4 import BeautifulSoup
url = 'http://quotes.toscrape.com/'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'lxml')
quotes = soup.find_all('span', class_='text')
print(quotes)
当上面的代码运行时,quotes 变量被分配了一个Html 文档中的所有元素的列表,这些元素是span标签,其类别为text。打印出这个 quotes 变量,我们就可以看到下面的输出。整个 Html 标签连同其内部内容都被捕获。

美丽的汤文本属性
脚本中返回的额外的 Html 标记并不是我们真正感兴趣的东西。为了只得到我们想要的数据,在这种情况下,实际的报价,我们可以使用通过Beautiful Soup提供给我们的**.text**属性。注意这里的新的高亮代码,我们使用for循环来迭代所有捕获的数据,并只打印出我们想要的内容。
scraper.py
import requests
from bs4 import BeautifulSoup
url = 'http://quotes.toscrape.com/'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'lxml')
quotes = soup.find_all('span', class_='text')
for quote in quotes:
print(quote.text)
这给了我们一个很好的输出,其中只有我们感兴趣的引号。
C:pythonvrequestsScriptspython.exe C:/python/vrequests/scraper.py
“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”
“It is our choices, Harry, that show what we truly are, far more than our abilities.”
“There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”
“The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.”
“Imperfection is beauty, madness is genius and it's better to be absolutely ridiculous than absolutely boring.”
“Try not to become a man of success. Rather become a man of value.”
“It is better to be hated for what you are than to be loved for what you are not.”
“I have not failed. I've just found 10,000 ways that won't work.”
“A woman is like a tea bag; you never know how strong it is until it's in hot water.”
“A day without sunshine is like, you know, night.”
Process finished with exit code 0
很好!很好现在要找到所有的作者并打印出来,因为他们与每一句话相关,我们可以使用下面的代码。按照之前的步骤,我们首先手动检查我们要搜刮的页面。我们可以看到,每个作者都包含在一个带有作者类的标签内。因此,我们按照之前的格式使用find_all()函数,并将结果存储在那个新的 **authors**变量中。我们还需要改变for循环,利用range()函数,这样我们就可以同时遍历引文和作者。
scraper.py
import requests
from bs4 import BeautifulSoup
url = 'http://quotes.toscrape.com/'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'lxml')
quotes = soup.find_all('span', class_='text')
authors = soup.find_all('small', class_='author')
for i in range(0, len(quotes)):
print(quotes[i].text)
print('--' + authors[i].text)
现在,当脚本运行时,我们可以得到引语和每个相关的作者。
C:pythonvrequestsScriptspython.exe C:/python/vrequests/scraper.py
“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”
--Albert Einstein
“It is our choices, Harry, that show what we truly are, far more than our abilities.”
--J.K. Rowling
“There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”
--Albert Einstein
“The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.”
--Jane Austen
“Imperfection is beauty, madness is genius and it's better to be absolutely ridiculous than absolutely boring.”
--Marilyn Monroe
“Try not to become a man of success. Rather become a man of value.”
--Albert Einstein
“It is better to be hated for what you are than to be loved for what you are not.”
--André Gide
“I have not failed. I've just found 10,000 ways that won't work.”
--Thomas A. Edison
“A woman is like a tea bag; you never know how strong it is until it's in hot water.”
--Eleanor Roosevelt
“A day without sunshine is like, you know, night.”
--Steve Martin
Process finished with exit code 0
最后,我们将添加一些代码来获取每个报价的所有标签。这个有点棘手,因为我们首先需要获取每个标签集合的外层包装div。如果我们不做这第一步,那么我们可以获取所有的标签,但我们不知道如何将它们与报价和作者对联系起来。一旦外部div被捕获,我们就可以通过对*该*子集再次使用find_all()函数进一步深入。从这里开始,我们必须在第一个循环中添加一个内循环来完成这个过程。
import requests
from bs4 import BeautifulSoup
url = 'http://quotes.toscrape.com/'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'lxml')
quotes = soup.find_all('span', class_='text')
authors = soup.find_all('small', class_='author')
tags = soup.find_all('div', class_='tags')
for i in range(0, len(quotes)):
print(quotes[i].text)
print('--' + authors[i].text)
tagsforquote = tags[i].find_all('a', class_='tag')
for tagforquote in tagsforquote:
print(tagforquote.text)
print('n')
这段代码现在给了我们以下的结果。很酷,对吗?
C:pythonvrequestsScriptspython.exe C:/python/vrequests/scraper.py
“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”
--Albert Einstein
change
deep-thoughts
thinking
world
“It is our choices, Harry, that show what we truly are, far more than our abilities.”
--J.K. Rowling
abilities
choices
“There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”
--Albert Einstein
inspirational
life
live
miracle
miracles
“The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.”
--Jane Austen
aliteracy
books
classic
humor
“Imperfection is beauty, madness is genius and it's better to be absolutely ridiculous than absolutely boring.”
--Marilyn Monroe
be-yourself
inspirational
“Try not to become a man of success. Rather become a man of value.”
--Albert Einstein
adulthood
success
value
“It is better to be hated for what you are than to be loved for what you are not.”
--André Gide
life
love
“I have not failed. I've just found 10,000 ways that won't work.”
--Thomas A. Edison
edison
failure
inspirational
paraphrased
“A woman is like a tea bag; you never know how strong it is until it's in hot water.”
--Eleanor Roosevelt
misattributed-eleanor-roosevelt
“A day without sunshine is like, you know, night.”
--Steve Martin
humor
obvious
simile
Process finished with exit code 0
练习网络刮削
还有一些我们可以利用的实践网页。我们可以从这个网址开始 scrapingclub.com/exercise/li…

我们想简单地从每个条目中提取商品名称和价格,并以列表形式显示。所以第一步是检查页面的源代码,以确定我们如何在Html上进行搜索。 看起来我们有一些Bootstrap类,我们可以在其他方面进行搜索。

有了这些知识,这里是我们的Python脚本,用于这次搜刮。
import requests
from bs4 import BeautifulSoup
url = 'https://scrapingclub.com/exercise/list_basic/?page=1'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'lxml')
items = soup.find_all('div', class_='col-lg-4 col-md-6 mb-4')
count = 1
for i in items:
itemName = i.find('h4', class_='card-title').text.strip()
itemPrice = i.find('h5').text
print(f'{count}: {itemPrice} for the {itemName}')
count += 1
C:pythonvrequestsScriptspython.exe C:/python/vrequests/scraper.py
1: $24.99 for the Short Dress
2: $29.99 for the Patterned Slacks
3: $49.99 for the Short Chiffon Dress
4: $59.99 for the Off-the-shoulder Dress
5: $24.99 for the V-neck Top
6: $49.99 for the Short Chiffon Dress
7: $24.99 for the V-neck Top
8: $24.99 for the V-neck Top
9: $59.99 for the Short Lace Dress
Process finished with exit code 0
网络搜索不止一个页面
上面的URL是一个分页的集合中的一个单页。我们可以通过URL中的page=1看出这一点。我们也可以设置一个Beautiful Soup脚本来一次搜刮超过一个页面。这里有一个脚本,可以从原始页面中刮取所有的链接页面。一旦所有这些URL被捕获,脚本就可以向每个单独的页面发出请求并解析出结果。
scraper.py
import requests
from bs4 import BeautifulSoup
url = 'https://scrapingclub.com/exercise/list_basic/?page=1'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'lxml')
items = soup.find_all('div', class_='col-lg-4 col-md-6 mb-4')
count = 1
for i in items:
itemName = i.find('h4', class_='card-title').text.strip()
itemPrice = i.find('h5').text
print(f'{count}: {itemPrice} for the {itemName}')
count += 1
pages = soup.find('ul', class_='pagination')
urls = []
links = pages.find_all('a', class_='page-link')
for link in links:
pageNum = int(link.text) if link.text.isdigit() else None
if pageNum != None:
hrefval = link.get('href')
urls.append(hrefval)
count = 1
for i in urls:
newUrl = url + i
response = requests.get(newUrl)
soup = BeautifulSoup(response.text, 'lxml')
items = soup.find_all('div', class_='col-lg-4 col-md-6 mb-4')
for i in items:
itemName = i.find('h4', class_='card-title').text.strip()
itemPrice = i.find('h5').text
print(f'{count}: {itemPrice} for the {itemName}')
count += 1
运行该脚本,就可以一次性地抓取所有的页面,并输出一个像这样的大列表。
C:pythonvrequestsScriptspython.exe C:/python/vrequests/scraper.py
1: $24.99 for the Short Dress
2: $29.99 for the Patterned Slacks
3: $49.99 for the Short Chiffon Dress
4: $59.99 for the Off-the-shoulder Dress
5: $24.99 for the V-neck Top
6: $49.99 for the Short Chiffon Dress
7: $24.99 for the V-neck Top
8: $24.99 for the V-neck Top
9: $59.99 for the Short Lace Dress
1: $24.99 for the Short Dress
2: $29.99 for the Patterned Slacks
3: $49.99 for the Short Chiffon Dress
4: $59.99 for the Off-the-shoulder Dress
5: $24.99 for the V-neck Top
6: $49.99 for the Short Chiffon Dress
7: $24.99 for the V-neck Top
8: $24.99 for the V-neck Top
9: $59.99 for the Short Lace Dress
10: $24.99 for the Short Dress
11: $29.99 for the Patterned Slacks
12: $49.99 for the Short Chiffon Dress
13: $59.99 for the Off-the-shoulder Dress
14: $24.99 for the V-neck Top
15: $49.99 for the Short Chiffon Dress
16: $24.99 for the V-neck Top
17: $24.99 for the V-neck Top
18: $59.99 for the Short Lace Dress
19: $24.99 for the Short Dress
20: $29.99 for the Patterned Slacks
21: $49.99 for the Short Chiffon Dress
22: $59.99 for the Off-the-shoulder Dress
23: $24.99 for the V-neck Top
24: $49.99 for the Short Chiffon Dress
25: $24.99 for the V-neck Top
26: $24.99 for the V-neck Top
27: $59.99 for the Short Lace Dress
28: $24.99 for the Short Dress
29: $29.99 for the Patterned Slacks
30: $49.99 for the Short Chiffon Dress
31: $59.99 for the Off-the-shoulder Dress
32: $24.99 for the V-neck Top
33: $49.99 for the Short Chiffon Dress
34: $24.99 for the V-neck Top
35: $24.99 for the V-neck Top
36: $59.99 for the Short Lace Dress
37: $24.99 for the Short Dress
38: $29.99 for the Patterned Slacks
39: $49.99 for the Short Chiffon Dress
40: $59.99 for the Off-the-shoulder Dress
41: $24.99 for the V-neck Top
42: $49.99 for the Short Chiffon Dress
43: $24.99 for the V-neck Top
44: $24.99 for the V-neck Top
45: $59.99 for the Short Lace Dress
46: $24.99 for the Short Dress
47: $29.99 for the Patterned Slacks
48: $49.99 for the Short Chiffon Dress
49: $59.99 for the Off-the-shoulder Dress
50: $24.99 for the V-neck Top
51: $49.99 for the Short Chiffon Dress
52: $24.99 for the V-neck Top
53: $24.99 for the V-neck Top
54: $59.99 for the Short Lace Dress
Process finished with exit code 0
用Beautiful Soup进行Python网页抓取摘要
[Beautiful Soup]是为使用Python进行网页抓取而建立的少数可用库之一。正如我们在本教程中看到的那样,使用Beautiful Soup很容易上手。网络刮削脚本可以用来从互联网上收集和编译数据,用于各种类型的数据分析项目,或者用于你的想象力所想到的其他事情。