无涯教程-jsoup - 使用DOM方法

57 阅读1分钟

以下示例将HTML解析为Document对象之后,使用类似DOM的方法获取元素信息。

Document document=Jsoup.parse(html);
Element sampleDiv=document.getElementById("sampleDiv");
Elements links=sampleDiv.getElementsByTag("a");

parse(String html)方法将输入的HTML解析为新的文档,该文档对象可用于遍历并获取html dom的详细信息。

Document.getElementById示例

使用您选择的任何编辑器在C:/> jsoup中创建以下Java程序。

JsoupTester.java

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

public class JsoupTester { public static void main(String[] args) {

  </span><span class="typ">String</span><span class="pln"> html </span><span class="pun">=</span><span class="pln"> </span><span class="str">"&lt;html&gt;&lt;head&gt;&lt;title&gt;Sample Title&lt;/title&gt;&lt;/head&gt;"</span><span class="pln">
     </span><span class="pun">+</span><span class="pln"> </span><span class="str">"&lt;body&gt;"</span><span class="pln">
     </span><span class="pun">+</span><span class="pln"> </span><span class="str">"&lt;p&gt;Sample Content&lt;/p&gt;"</span><span class="pln">
     </span><span class="pun">+</span><span class="pln"> </span><span class="str">"&lt;div id=sampleDiv&gt;&lt;a href=www.google.com&gt;Google&lt;/a&gt;&lt;/div&gt;"</span><span class="pln">
     </span><span class="pun">+</span><span class="str">"&lt;/body&gt;&lt;/html&gt;"</span><span class="pun">;</span><span class="pln">
  </span><span class="typ">Document</span><span class="pln"> document </span><span class="pun">=</span><span class="pln"> </span><span class="typ">Jsoup</span><span class="pun">.</span><span class="pln">parse</span><span class="pun">(</span><span class="pln">html</span><span class="pun">);</span><span class="pln">
  </span><span class="typ">System</span><span class="pun">.</span><span class="kwd">out</span><span class="pun">.</span><span class="pln">println</span><span class="pun">(</span><span class="pln">document</span><span class="pun">.</span><span class="pln">title</span><span class="pun">());</span><span class="pln">
  </span><span class="typ">Elements</span><span class="pln"> paragraphs </span><span class="pun">=</span><span class="pln"> document</span><span class="pun">.</span><span class="pln">getElementsByTag</span><span class="pun">(</span><span class="str">"p"</span><span class="pun">);</span><span class="pln">
  </span><span class="kwd">for</span><span class="pln"> </span><span class="pun">(</span><span class="typ">Element</span><span class="pln"> paragraph </span><span class="pun">:</span><span class="pln"> paragraphs</span><span class="pun">)</span><span class="pln"> </span><span class="pun">{</span><span class="pln">
     </span><span class="typ">System</span><span class="pun">.</span><span class="kwd">out</span><span class="pun">.</span><span class="pln">println</span><span class="pun">(</span><span class="pln">paragraph</span><span class="pun">.</span><span class="pln">text</span><span class="pun">());</span><span class="pln">
  </span><span class="pun">}</span><span class="pln">

  </span><span class="typ">Element</span><span class="pln"> sampleDiv </span><span class="pun">=</span><span class="pln"> document</span><span class="pun">.</span><span class="pln">getElementById</span><span class="pun">(</span><span class="str">"sampleDiv"</span><span class="pun">);</span><span class="pln">
  </span><span class="typ">System</span><span class="pun">.</span><span class="kwd">out</span><span class="pun">.</span><span class="pln">println</span><span class="pun">(</span><span class="str">"Data: "</span><span class="pln"> </span><span class="pun">+</span><span class="pln"> sampleDiv</span><span class="pun">.</span><span class="pln">text</span><span class="pun">());</span><span class="pln">
  </span><span class="typ">Elements</span><span class="pln"> links </span><span class="pun">=</span><span class="pln"> sampleDiv</span><span class="pun">.</span><span class="pln">getElementsByTag</span><span class="pun">(</span><span class="str">"a"</span><span class="pun">);</span><span class="pln">

  </span><span class="kwd">for</span><span class="pln"> </span><span class="pun">(</span><span class="typ">Element</span><span class="pln"> link </span><span class="pun">:</span><span class="pln"> links</span><span class="pun">)</span><span class="pln"> </span><span class="pun">{</span><span class="pln">
     </span><span class="typ">System</span><span class="pun">.</span><span class="kwd">out</span><span class="pun">.</span><span class="pln">println</span><span class="pun">(</span><span class="str">"Href: "</span><span class="pln"> </span><span class="pun">+</span><span class="pln"> link</span><span class="pun">.</span><span class="pln">attr</span><span class="pun">(</span><span class="str">"href"</span><span class="pun">));</span><span class="pln">
     </span><span class="typ">System</span><span class="pun">.</span><span class="kwd">out</span><span class="pun">.</span><span class="pln">println</span><span class="pun">(</span><span class="str">"Text: "</span><span class="pln"> </span><span class="pun">+</span><span class="pln"> link</span><span class="pun">.</span><span class="pln">text</span><span class="pun">());</span><span class="pln">
  </span><span class="pun">}</span><span class="pln">

} }

使用 javac 编译器编译类,如下所示:

C:\jsoup>javac JsoupTester.java

现在运行JsoupTester以查看输出。

C:\jsoup>java JsoupTester

查看输出。

Sample Title
Sample Content
Data: Google
Href: www.google.com
Text: Google

参考链接

www.learnfk.com/jsoup/jsoup…