遍历文档树🌲
直接子节点
tag的名称
一个Tag可能包含多个字符串或其它的Tag,这些都是这个Tag的子节点。
Beautiful Soup提供了许多操作和遍历子节点的属性,比如直接获取tag的name值:

如果想获取到所有a标签的值,使用find_all方法

contents
contents属性将tag的子节点以列表的形式输出,获取到的是标签中的内容部分

children
返回的不是列表形式,可以通过遍历来进行获取子节点。实际上是以列表类型的迭代器

descendants-子孙节点
.contents 和 .children 属性仅包含tag的直接子节点,.descendants 属性可以对所有tag的子孙节点进行递归循环,和 children类似,我们也需要遍历获取其中的内容。
# 子孙节点
for each in soup.descendants:
print(each)
结果显示如下内容:
- html标签的全部内容
- body标签
<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a> and
<a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...</p>
</body></html>
<head><title>The Dormouse's story</title></head>
<title>The Dormouse's story</title>
The Dormouse's story
<body>
<p class="title"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a> and
<a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...</p>
</body>
<p class="title"><b>The Dormouse's story</b></p>
<b>The Dormouse's story</b>
The Dormouse's story
<p class="story">Once upon a time there were three little sisters; and their names were
<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a> and
<a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
Once upon a time there were three little sisters; and their names were
<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>
Elsie
,
<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>
Lacie
and
<a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
Tillie
;
and they lived at the bottom of a well.
<p class="story">...</p>
...
节点内容
如果一个标签里面没有标签了,那么 .string 就会返回标签里面的内容。如果标签里面只有唯一的一个标签了,那么 .string 也会返回最里面的内容:

如果存在多个子节点,tag就没法确定了,输出的结果就是None
多个节点
strings
repr()方法的使用


stripped_strings
输出的字符串中可能包含了很多的空格或者空行,使用该方法去除多余的空白内容
笔记📒:去除空白内容

父节点相关
parent

parents
将某个元素的所有父辈节点通过递归得到

兄弟节点
单个节点
知识点:.next_sibling .previous_sibling 属性
兄弟节点可以理解为和本节点处在同一级的节点
.next_sibling属性获取了该节点的下一个兄弟节点
.previous_sibling则与之相反,如果节点不存在,则返回 None
注意:实际文档中的tag的.next_sibling和.previous_sibling属性通常是字符串或空白,因为空白或者换行也可以被视作一个节点,所以得到的结果可能是空白或者换行
全部节点
知识点:.next_siblings .previous_siblings 属性
通过 .next_siblings 和 .previous_siblings 属性可以对当前节点的兄弟节点迭代输出