Python中获取HTML表格数据

38 阅读1分钟

用户需要从HTML代码中获取表格中的数据,但只获取了第一个表格的内容。

  • HTML代码包含多个表格,用户希望获取其中第一个表格中的数据。
  1. 解决方案
    • 使用BeautifulSoup库解析HTML代码。
    • 找到第一个表格元素。
    • 提取表格元素中的文本数据。
from bs4 import BeautifulSoup

html = """
<table width="100%" border="0" cellspacing="0" cellpadding="0">
                  <tr> 
                    <td width="7" rowspan="2">&nbsp;</td>
                    <td width='40%'> <div align="left">

                      </div>
                    <td width="7" rowspan="2">&nbsp;</td>
                  </tr>
                  <tr> 
                    <td colspan="2"> 
    <b><font face='Arial, Helvetica, sans-serif' size='2'>Account #: 8428995632 </font></b><BR><TABLE BORDER='1' width='100%' align='center' cellspacing='0'><TR><td align='left' colspan='2'><font face='Arial, Helvetica, sans-serif' size='2'><b>Billing Date:   </b><BR>07-22-2013</font></TD><td align='left' ><font face='Arial, Helvetica, sans-serif' size='2'><b>Past Due Date:    </b><BR>08-12-2013</font></TD></TR><TR><td align='left'><font face='Arial, Helvetica, sans-serif' size='2'><b>Service From: </b><BR>06-11-2013</font></TD><td align='left'><font face='Arial, Helvetica, sans-serif' size='2'><b>Service To:    </b><BR>07-11-2013</font></TD><td align='left'><font face='Arial, Helvetica, sans-serif' size='2'><b>Days of Service: </b><BR>30</font></TD></TR><TR><td align='left' colspan='2'><font face='Arial, Helvetica, sans-serif' size='2'><b>Current Charges:    </b>$30,488.60</font></TD><td align='left' ><font face='Arial, Helvetica, sans-serif' size='2'><b>Amount Due:   </b>$30,488.60</font></TD></TR></TR></TABLE><p><p><p><p><CENTER><font face='Arial, Helvetica, sans-serif' size='3'><b> Meter readings for this bill:</b></font></CENTER><TABLE BORDER='1' width='100%' align='center' cellspacing='0'><TR bgcolor='#FFF2D7'><td align='center' width='18%'><font face='Arial,Helvetica,  sans-serif' size='2'><b>Meter</b></font></TD><td align='center' width='17%'><font face='Arial, Helvetica, sans-serif' size='2'><b>Service<br>From</b></font></TD><td align='center' width='17%'><font face='Arial, Helvetica, sans-serif' size='2'><b>Service<br>To</b></font></TD><td align='center' width='12%'><font face='Arial, Helvetica, sans-serif' size='2'><b># Days</b></font></TD><td align='center' width='10%'><font face='Arial, Helvetica, sans-serif' size='2'><b>Prior<br>Read</b></font></TD><td align='center' width='10%'><font face='Arial, Helvetica, sans-serif' size='2'><b>Current<br>Read</b></font></TD><td align='center' width='16%'><font face='Arial, Helvetica, sans-serif' size='2'><b>Consumption</b></font></TD><TR><td align='center' width='8%'><font face='Arial,Helvetica,sans-serif' size='2'>S10406906</FONT></TD><td align='center' width='18%'><font face='Arial, Helvetica, sans-serif' size='2'>06-11-2013</FONT></TD><td align='center' width='12%'><font face='Arial, Helvetica, sans-serif' size='2'>07-11-2013</FONT></TD><td align='center' width='8%'><font face='Arial, Helvetica, sans-serif' size='2'>30</FONT></TD><td align='center' width='16%'><font face='Arial, Helvetica, sans-serif' size='2'>134</FONT></TD><td align='center' width='22%'><font face='Arial, Helvetica, sans-serif' size='2'>144</FONT></TD><td align='center' width='16%'><font face='Arial, Helvetica, sans-serif' size='2'>10</FONT></TD></TR></FONT><TR><td align='center' width='8%'><font face='Arial,Helvetica,sans-serif' size='2'>08400002</FONT></TD><td align='center' width='18%'><font face='Arial, Helvetica, sans-serif' size='2'>06-11-2013</FONT></TD><td align='center' width='12%'><font face='Arial, Helvetica, sans-serif' size='2'>07-11-2013</FONT></TD><td align='center' width='8%'><font face='Arial, Helvetica, sans-serif' size='2'>30</FONT></TD><td align='center' width='16%'><font face='Arial, Helvetica, sans-serif' size='2'>30748</FONT></TD><td align='center' width='22%'><font face='Arial, Helvetica, sans-serif' size='2'>32634</FONT></TD><td align='center' width='16%'><font face='Arial, Helvetica, sans-serif' size='2'>1886</FONT></TD></TR></FONT><TR><td align='center' width='8%'><font face='Arial,Helvetica,sans-serif' size='2'>S10406911</FONT></TD><td align='center' width='18%'><font face='Arial, Helvetica, sans-serif' size='2'>06-11-2013</FONT></TD><td align='center' width='12%'><font face='Arial, Helvetica, sans-serif' size='2'>07-11-2013</FONT></TD><td align='center' width='8%'><font face='Arial, Helvetica, sans-serif' size='2'>30</FONT></TD><td align='center' width='16%'><font face='Arial, Helvetica, sans-serif' size='2'>2717</FONT></TD><td align='center' width='22%'><font face='Arial, Helvetica, sans-serif' size='2'>3046</FONT></TD><td align='center' width='16%'><font face='Arial, Helvetica, sans-serif' size='2'>329</FONT></TD></TR></FONT><TR><td align='center' width='8%'><font face='Arial,Helvetica,sans-serif' size='2'>08405704</FONT></TD><td align='center' width='18%'><font face='Arial, Helvetica, sans-serif' size='2'>06-11-2013</FONT></TD><td align='center' width='12%'><font face='Arial, Helvetica, sans-serif' size='2'>07-11-2013</FONT></TD><td align='center' width='8%'><font face='Arial, Helvetica, sans-serif' size='2'>30</FONT></TD><td align='center' width='16%'><font face='Arial, Helvetica, sans-serif' size='2'>23755</FONT></TD><td align='center' width='22%'><font face='Arial, Helvetica, sans-serif' size='2'>25100</FONT></TD><td align='center' width='16%'><font face='Arial, Helvetica, sans-serif' size='2'>1345</FONT></TD></TR></FONT><TR><td align='center' width='8%'><font face='Arial,Helvetica,sans-serif' size='2'>S10406895</FONT></TD><td align='center' width='18%'><font face='Arial, Helvetica, sans-serif' size='2'>06-11-2013</FONT></TD><td align='center' width='12%'><font face='Arial,