从URL中获取所有超链接正则表达式

·  阅读 885
下面的例子演示如何利用正则表达式从一个URL中查找并输出所有类似下面的超链接:
首先我们从命令行输入URL地址,打开输入流,读取URL的内容并转化为字符串存入htmlString中。然后以"(<a\\s*href=[^>]*>)"构造正则表达式,最后在字符串htmlString中查找匹配的字符串。
import java.io.*;
import java.net.*;
import java.util.regex.*;
public class GetHref {
public static void main(String[] args) {
InputStream in = null;
PrintWriter out = null;
String htmlString=null;
try {
// Check the arguments
if ((args.length != 1)&& (args.length != 2))
throw new IllegalArgumentException("Wrong number of args");

// Set up the streams
URL url = new URL(args[0]); // Create the URL
in = url.openStream(); // Open a stream to it
if (args.length == 2) // Get an appropriate output stream
out = new PrintWriter(new FileWriter(args[1]));
BufferedReader bin=new BufferedReader(new InputStreamReader(in));
String line;
StringBuffer sb = new StringBuffer();
while((line=bin.readLine())!=null){
if(out!=null) out.println(line);
sb=sb.append(line);
}
htmlString=sb.toString();
// System.out.println(sb.toString());
}
// On exceptions, print error message and usage message.
catch (Exception e) {
System.err.println(e);
System.err.println("Usage: java GetURL <URL> [<filename>]");
}
finally { // Always close the streams, no matter what.
try { in.close(); out.close(); } catch (Exception e) {}
}
Pattern p = Pattern.compile("(<a\\s*href=[^>]*>)");
Matcher m = p.matcher(htmlString);
boolean result = m.find();
while(result){
for(int i=1;i<=m.groupCount();i++){
System.out.println(m.group(i));
}
result=m.find();
}
}
}
程序运行结果:
C:\java>java GetHref http://127.0.0.1:8080/zz3zcwbwebhome/index.jsp
<a href=>
<a href="javascript:" class="bb">
<a href="javascript:" class="bb">
<a href="learn.jsp">
<a href="download.jsp">
<a href="article.jsp">
<a href="#">
<a href="#">
<a href="#">
<a href="#" class="FrameTeitle">
<a href="#" class="FrameTeitle">
<a href=view.jsp?id=89>
<a href=view.jsp?id=88>


分类:
代码人生
标签:
分类:
代码人生
标签: