Pandas的 [read_html()](https://blog.finxter.com/reading-and-writing-html-with-pandas/)函数是将HTML表格(例如,存储在一个给定的URL中)转换为Pandas DataFrame的简单方法。你向它传递一个位置字符串或路径,它就会返回一个DataFrames的列表,每一个都代表位置路径或URL中的一个表。


例如,下面的代码将[维基百科Python文章]的所有表格读成一个DataFrames的列表(每个HTML表格一个df )。
import pandas as pd
tables = pd.read_html('https://en.wikipedia.org/wiki/Python_(programming_language)')
print(f'Number of tables: {len(tables)}')
# Number of tables: 13
返回值的类型是一个DataFrames的列表。
print(type(tables))
# <class 'list'>
print(type(tables[0]))
# <class 'pandas.core.frame.DataFrame'>
完整的例子
这是tables HTML 表格列表中第一个表格的 HTML 表格代码。

我在下面的附录中给出了该HTML表的代码。
这是调用Pandas的read_html() ,从DataFrames列表中得到的DataFrame。

>>> tables[0]
0 1
0 NaN NaN
1 NaN NaN
2 Paradigm Multi-paradigm: object-oriented,[1] procedural...
3 Designed by Guido van Rossum
4 Developer Python Software Foundation
5 First appeared 20 February 1991; 31 years ago[2]
6 NaN NaN
7 Stable release 3.10.6[3] / 2 August 2022; 16 days ago
8 Preview release 3.11.0rc1[4] / 8 August 2022; 10 days ago
9 Typing discipline Duck, dynamic, strong typing;[5] gradual (sinc...
10 OS Windows, macOS, Linux/UNIX, Android[7][8] and ...
11 License Python Software Foundation License
12 Filename extensions .py, .pyi, .pyc, .pyd, .pyw, .pyz (since 3.5),...
13 Website python.org
14 Major implementations Major implementations
15 CPython, PyPy, Stackless Python, MicroPython, ... CPython, PyPy, Stackless Python, MicroPython, ...
16 Dialects Dialects
17 Cython, RPython, Starlark[12] Cython, RPython, Starlark[12]
18 Influenced by Influenced by
19 ABC,[13] Ada,[14] ALGOL 68,[15] APL,[16] C,[17... ABC,[13] Ada,[14] ALGOL 68,[15] APL,[16] C,[17...
20 Influenced Influenced
21 Apache Groovy, Boo, Cobra, CoffeeScript,[24] D... Apache Groovy, Boo, Cobra, CoffeeScript,[24] D...
22 Python Programming at Wikibooks Python Programming at Wikibooks
注: 函数返回一个pandas.read_html() 列表的DataFrame,每个HTML表有一个DataFrame。因此,tables[0] 返回HTML文档中的第一个表,tables[1] 返回HTML文档中的第二个表,以此类推。你可以通过将结果包装在 [len()](https://blog.finxter.com/python-len/)函数中,就可以得到文档中的表格数量。len(pd.read_html(...)).
本文讨论的这种方法的一个有趣的应用是,通过使用pandas.read_html() 函数和 [df.to_csv()](https://blog.finxter.com/pandas-dataframe-to_csv-method/)方法。
附录
这是报废的HTML表格的HTML代码(例子)。
<table class="infobox vevent"><caption class="infobox-title summary">Python</caption><tbody><tr><td colspan="2" class="infobox-image"><a href="/wiki/File:Python-logo-notext.svg" class="image"><img alt="Python-logo-notext.svg" src="//upload.wikimedia.org/wikipedia/commons/thumb/c/c3/Python-logo-notext.svg/121px-Python-logo-notext.svg.png" decoding="async" width="121" height="121" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/c/c3/Python-logo-notext.svg/182px-Python-logo-notext.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/c/c3/Python-logo-notext.svg/242px-Python-logo-notext.svg.png 2x" data-file-width="110" data-file-height="110"></a></td></tr><tr><td colspan="2" class="infobox-full-data"><div style="text-align:center;"></div></td></tr><tr><th scope="row" class="infobox-label"><a href="/wiki/Programming_paradigm" title="Programming paradigm">Paradigm</a></th><td class="infobox-data"><a href="/wiki/Multi-paradigm_programming_language" class="mw-redirect" title="Multi-paradigm programming language">Multi-paradigm</a>: <a href="/wiki/Object-oriented_programming" title="Object-oriented programming">object-oriented</a>,<sup id="cite_ref-1" class="reference"><a href="#cite_note-1">[1]</a></sup> <a href="/wiki/Procedural_programming" title="Procedural programming">procedural</a> (<a href="/wiki/Imperative_programming" title="Imperative programming">imperative</a>), <a href="/wiki/Functional_programming" title="Functional programming">functional</a>, <a href="/wiki/Structured_programming" title="Structured programming">structured</a>, <a href="/wiki/Reflective_programming" title="Reflective programming">reflective</a></td></tr><tr><th scope="row" class="infobox-label"><a href="/wiki/Software_design" title="Software design">Designed by</a></th><td class="infobox-data"><a href="/wiki/Guido_van_Rossum" title="Guido van Rossum">Guido van Rossum</a></td></tr><tr><th scope="row" class="infobox-label"><a href="/wiki/Software_developer" class="mw-redirect" title="Software developer">Developer</a></th><td class="infobox-data organiser"><a href="/wiki/Python_Software_Foundation" title="Python Software Foundation">Python Software Foundation</a></td></tr><tr><th scope="row" class="infobox-label">First appeared</th><td class="infobox-data">20 February 1991<span class="noprint">; 31 years ago</span><span style="display:none"> (<span class="bday dtstart published updated">1991-02-20</span>)</span><sup id="cite_ref-alt-sources-history_2-0" class="reference"><a href="#cite_note-alt-sources-history-2">[2]</a></sup></td></tr><tr><td colspan="2" class="infobox-full-data"><link rel="mw-deduplicated-inline-style" href="mw-data:TemplateStyles:r1066479718"></td></tr><tr><th scope="row" class="infobox-label" style="white-space: nowrap;"><a href="/wiki/Software_release_life_cycle" title="Software release life cycle">Stable release</a></th><td class="infobox-data"><div style="margin:0px;">3.10.6<sup id="cite_ref-wikidata-7f169d99022038b3a6e5d41083301f64776ffb60-v3_3-0" class="reference"><a href="#cite_note-wikidata-7f169d99022038b3a6e5d41083301f64776ffb60-v3-3">[3]</a></sup> <a href="https://www.wikidata.org/wiki/Q28865?uselang=en#P348" title="Edit this on Wikidata"><img alt="Edit this on Wikidata" src="//upload.wikimedia.org/wikipedia/en/thumb/8/8a/OOjs_UI_icon_edit-ltr-progressive.svg/10px-OOjs_UI_icon_edit-ltr-progressive.svg.png" decoding="async" width="10" height="10" style="vertical-align: text-top" srcset="//upload.wikimedia.org/wikipedia/en/thumb/8/8a/OOjs_UI_icon_edit-ltr-progressive.svg/15px-OOjs_UI_icon_edit-ltr-progressive.svg.png 1.5x, //upload.wikimedia.org/wikipedia/en/thumb/8/8a/OOjs_UI_icon_edit-ltr-progressive.svg/20px-OOjs_UI_icon_edit-ltr-progressive.svg.png 2x" data-file-width="20" data-file-height="20"></a>
/ 2 August 2022<span class="noprint">; 16 days ago</span><span style="display:none"> (<span class="bday dtstart published updated">2 August 2022</span>)</span></div></td></tr><tr><th scope="row" class="infobox-label" style="white-space: nowrap;"><a href="/wiki/Software_release_life_cycle#Beta" title="Software release life cycle">Preview release</a></th><td class="infobox-data"><div style="margin:0px;">3.11.0rc1<sup id="cite_ref-wikidata-ffe8152c4aa79586b276ccdeaca9829daa2bc5af-v3_4-0" class="reference"><a href="#cite_note-wikidata-ffe8152c4aa79586b276ccdeaca9829daa2bc5af-v3-4">[4]</a></sup> <a href="https://www.wikidata.org/wiki/Q28865?uselang=en#P348" title="Edit this on Wikidata"><img alt="Edit this on Wikidata" src="//upload.wikimedia.org/wikipedia/en/thumb/8/8a/OOjs_UI_icon_edit-ltr-progressive.svg/10px-OOjs_UI_icon_edit-ltr-progressive.svg.png" decoding="async" width="10" height="10" style="vertical-align: text-top" srcset="//upload.wikimedia.org/wikipedia/en/thumb/8/8a/OOjs_UI_icon_edit-ltr-progressive.svg/15px-OOjs_UI_icon_edit-ltr-progressive.svg.png 1.5x, //upload.wikimedia.org/wikipedia/en/thumb/8/8a/OOjs_UI_icon_edit-ltr-progressive.svg/20px-OOjs_UI_icon_edit-ltr-progressive.svg.png 2x" data-file-width="20" data-file-height="20"></a>
/ 8 August 2022<span class="noprint">; 10 days ago</span><span style="display:none"> (<span class="bday dtstart published updated">8 August 2022</span>)</span></div></td></tr><tr style="display:none"><td colspan="2">
</td></tr><tr><th scope="row" class="infobox-label"><a href="/wiki/Type_system" title="Type system">Typing discipline</a></th><td class="infobox-data"><a href="/wiki/Duck_typing" title="Duck typing">Duck</a>, <a href="/wiki/Dynamic_typing" class="mw-redirect" title="Dynamic typing">dynamic</a>, <a href="/wiki/Strong_and_weak_typing" title="Strong and weak typing">strong typing</a>;<sup id="cite_ref-5" class="reference"><a href="#cite_note-5">[5]</a></sup> <a href="/wiki/Gradual_typing" title="Gradual typing">gradual</a> (since 3.5, but ignored in <a href="/wiki/CPython" title="CPython">CPython</a>)<sup id="cite_ref-6" class="reference"><a href="#cite_note-6">[6]</a></sup></td></tr><tr><th scope="row" class="infobox-label"><a href="/wiki/Operating_system" title="Operating system">OS</a></th><td class="infobox-data"><a href="/wiki/Windows" class="mw-redirect" title="Windows">Windows</a>, <a href="/wiki/MacOS" title="MacOS">macOS</a>, <a href="/wiki/Linux" title="Linux">Linux/UNIX</a>, <a href="/wiki/Android_(operating_system)" title="Android (operating system)">Android</a><sup id="cite_ref-7" class="reference"><a href="#cite_note-7">[7]</a></sup><sup id="cite_ref-8" class="reference"><a href="#cite_note-8">[8]</a></sup> and more<sup id="cite_ref-9" class="reference"><a href="#cite_note-9">[9]</a></sup></td></tr><tr><th scope="row" class="infobox-label"><a href="/wiki/Software_license" title="Software license">License</a></th><td class="infobox-data"><a href="/wiki/Python_Software_Foundation_License" title="Python Software Foundation License">Python Software Foundation License</a></td></tr><tr><th scope="row" class="infobox-label"><a href="/wiki/Filename_extension" title="Filename extension">Filename extensions</a></th><td class="infobox-data">.py, .pyi, .pyc, .pyd, .pyw, .pyz (since 3.5),<sup id="cite_ref-10" class="reference"><a href="#cite_note-10">[10]</a></sup> .pyo (prior to 3.5)<sup id="cite_ref-11" class="reference"><a href="#cite_note-11">[11]</a></sup></td></tr><tr><th scope="row" class="infobox-label">Website</th><td class="infobox-data"><span class="url"><a rel="nofollow" class="external text" href="https://www.python.org/">python.org</a></span></td></tr><tr><th colspan="2" class="infobox-header" style="background-color: #eee;">Major <a href="/wiki/Programming_language_implementation" title="Programming language implementation">implementations</a></th></tr><tr><td colspan="2" class="infobox-full-data"><a href="/wiki/CPython" title="CPython">CPython</a>, <a href="/wiki/PyPy" title="PyPy">PyPy</a>, <a href="/wiki/Stackless_Python" title="Stackless Python">Stackless Python</a>, <a href="/wiki/MicroPython" title="MicroPython">MicroPython</a>, <a href="/wiki/CircuitPython" title="CircuitPython">CircuitPython</a>, <a href="/wiki/IronPython" title="IronPython">IronPython</a>, <a href="/wiki/Jython" title="Jython">Jython</a></td></tr><tr><th colspan="2" class="infobox-header" style="background-color: #eee;"><a href="/wiki/Programming_language#Dialects,_flavors_and_implementations" title="Programming language">Dialects</a></th></tr><tr><td colspan="2" class="infobox-full-data"><a href="/wiki/Cython" title="Cython">Cython</a>, <a href="/wiki/PyPy#RPython" title="PyPy">RPython</a>, <a href="/wiki/Bazel_(software)" title="Bazel (software)">Starlark</a><sup id="cite_ref-12" class="reference"><a href="#cite_note-12">[12]</a></sup></td></tr><tr><th colspan="2" class="infobox-header" style="background-color: #eee;">Influenced by</th></tr><tr><td colspan="2" class="infobox-full-data"><a href="/wiki/ABC_(programming_language)" title="ABC (programming language)">ABC</a>,<sup id="cite_ref-faq-created_13-0" class="reference"><a href="#cite_note-faq-created-13">[13]</a></sup> <a href="/wiki/Ada_(programming_language)" title="Ada (programming language)">Ada</a>,<sup id="cite_ref-14" class="reference"><a href="#cite_note-14">[14]</a></sup> <a href="/wiki/ALGOL_68" title="ALGOL 68">ALGOL 68</a>,<sup id="cite_ref-98-interview_15-0" class="reference"><a href="#cite_note-98-interview-15">[15]</a></sup> <a href="/wiki/APL_(programming_language)" title="APL (programming language)">APL</a>,<sup id="cite_ref-python.org_16-0" class="reference"><a href="#cite_note-python.org-16">[16]</a></sup> <a href="/wiki/C_(programming_language)" title="C (programming language)">C</a>,<sup id="cite_ref-AutoNT-1_17-0" class="reference"><a href="#cite_note-AutoNT-1-17">[17]</a></sup> <a href="/wiki/C%2B%2B" title="C++">C++</a>,<sup id="cite_ref-classmix_18-0" class="reference"><a href="#cite_note-classmix-18">[18]</a></sup> <a href="/wiki/CLU_(programming_language)" title="CLU (programming language)">CLU</a>,<sup id="cite_ref-effbot-call-by-object_19-0" class="reference"><a href="#cite_note-effbot-call-by-object-19">[19]</a></sup> <a href="/wiki/Dylan_(programming_language)" title="Dylan (programming language)">Dylan</a>,<sup id="cite_ref-AutoNT-2_20-0" class="reference"><a href="#cite_note-AutoNT-2-20">[20]</a></sup> <a href="/wiki/Haskell_(programming_language)" class="mw-redirect" title="Haskell (programming language)">Haskell</a>,<sup id="cite_ref-AutoNT-3_21-0" class="reference"><a href="#cite_note-AutoNT-3-21">[21]</a></sup> <a href="/wiki/Icon_(programming_language)" title="Icon (programming language)">Icon</a>,<sup id="cite_ref-AutoNT-4_22-0" class="reference"><a href="#cite_note-AutoNT-4-22">[22]</a></sup> <a href="/wiki/Lisp_(programming_language)" title="Lisp (programming language)">Lisp</a>,<sup id="cite_ref-AutoNT-6_23-0" class="reference"><a href="#cite_note-AutoNT-6-23">[23]</a></sup> <span class="nowrap"><a href="/wiki/Modula-3" title="Modula-3">Modula-3</a></span>,<sup id="cite_ref-classmix_18-1" class="reference"><a href="#cite_note-classmix-18">[18]</a></sup> <a href="/wiki/Perl" title="Perl">Perl</a>, <a href="/wiki/Standard_ML" title="Standard ML">Standard ML</a>, <a href="/wiki/Visual_Basic" title="Visual Basic">VB</a><sup id="cite_ref-python.org_16-1" class="reference"><a href="#cite_note-python.org-16">[16]</a></sup></td></tr><tr><th colspan="2" class="infobox-header" style="background-color: #eee;">Influenced</th></tr><tr><td colspan="2" class="infobox-full-data"><a href="/wiki/Apache_Groovy" title="Apache Groovy">Apache Groovy</a>, <a href="/wiki/Boo_(programming_language)" title="Boo (programming language)">Boo</a>, <a href="/wiki/Cobra_(programming_language)" title="Cobra (programming language)">Cobra</a>, <a href="/wiki/CoffeeScript" title="CoffeeScript">CoffeeScript</a>,<sup id="cite_ref-24" class="reference"><a href="#cite_note-24">[24]</a></sup> <a href="/wiki/D_(programming_language)" title="D (programming language)">D</a>, <a href="/wiki/F_Sharp_(programming_language)" title="F Sharp (programming language)">F#</a>, <a href="/wiki/Genie_(programming_language)" title="Genie (programming language)">Genie</a>,<sup id="cite_ref-25" class="reference"><a href="#cite_note-25">[25]</a></sup> <a href="/wiki/Go_(programming_language)" title="Go (programming language)">Go</a>, <a href="/wiki/JavaScript" title="JavaScript">JavaScript</a>,<sup id="cite_ref-26" class="reference"><a href="#cite_note-26">[26]</a></sup><sup id="cite_ref-27" class="reference"><a href="#cite_note-27">[27]</a></sup> <a href="/wiki/Julia_(programming_language)" title="Julia (programming language)">Julia</a>,<sup id="cite_ref-Julia_28-0" class="reference"><a href="#cite_note-Julia-28">[28]</a></sup> <a href="/wiki/Nim_(programming_language)" title="Nim (programming language)">Nim</a>, <a href="/wiki/Ring_(programming_language)" title="Ring (programming language)">Ring</a>,<sup id="cite_ref-The_Ring_programming_language_and_other_languages_29-0" class="reference"><a href="#cite_note-The_Ring_programming_language_and_other_languages-29">[29]</a></sup> <a href="/wiki/Ruby_(programming_language)" title="Ruby (programming language)">Ruby</a>,<sup id="cite_ref-bini_30-0" class="reference"><a href="#cite_note-bini-30">[30]</a></sup> <a href="/wiki/Swift_(programming_language)" title="Swift (programming language)">Swift</a><sup id="cite_ref-lattner2014_31-0" class="reference"><a href="#cite_note-lattner2014-31">[31]</a></sup></td></tr><tr><td colspan="2" class="infobox-below hlist" style="border-top: 1px solid #aaa; padding-top: 3px;">
<ul><li><a href="/wiki/File:Wikibooks-logo-en-noslogan.svg" class="image"><img alt="" src="//upload.wikimedia.org/wikipedia/commons/thumb/d/df/Wikibooks-logo-en-noslogan.svg/16px-Wikibooks-logo-en-noslogan.svg.png" decoding="async" width="16" height="16" class="noviewer" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/d/df/Wikibooks-logo-en-noslogan.svg/24px-Wikibooks-logo-en-noslogan.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/d/df/Wikibooks-logo-en-noslogan.svg/32px-Wikibooks-logo-en-noslogan.svg.png 2x" data-file-width="400" data-file-height="400"></a> <a href="https://en.wikibooks.org/wiki/Python_Programming" class="extiw" title="wikibooks:Python Programming">Python Programming</a> at Wikibooks</li></ul>
</td></tr></tbody></table>