jsoup对于html的解析-爬虫

110 阅读1分钟

依赖

        <!-- jsoup-->
        <dependency>
            <groupId>org.jsoup</groupId>
            <artifactId>jsoup</artifactId>
            <version>1.11.3</version>
        </dependency>

代码

 //从URL加载HTML
       // Document document = Jsoup.connect("https://baijiahao.baidu.com/s?id=1678670461780276039&wfr=spider&for=pc").get();
        Document document = Jsoup.connect("http://stock.10jqka.com.cn/20210517/c629417284.shtml").get();

        String title = document.title();
        //获取html中的标题
        System.out.println("title :"+title);
        //获得span标签的所有文本
        String strings=document.select("span").html();
        System.out.println(strings);

参考:https://www.jianshu.com/p/69b395bee43a