HttpClient入门及其应用这些需求可能或多或少的会发生在平时的开发中，针对每种情况，可能解决方案不止一种。本文将会

·项目中需要与一个基于HTTP协议的第三方的接口进行对接

·项目中需要动态的调用WebService服务（不生成本地源码）

·项目中需要利用其它网站的相关数据

这些需求可能或多或少的会发生在平时的开发中，针对每种情况，可能解决方案不止一种。本文将会使用HttpClient这种工具来讲解HttpClient的相关知识，以及如何使用HttpClient完成上述需求。

HttpClient是Apache Jakarta Common 下的子项目，可以用来提供高效的、最新的、功能丰富的支持 HTTP 协议的客户端编程工具包，并且它支持 HTTP 协议最新的版本和建议。（来源于百度百科）

——| 有人说，HttpClient不就是一个浏览器嘛。。。

可能不少人对HttpClient会产生这种误解，他们的观点是这样的：既然HttpClient是一个HTTP客户端编程工具，那不就相当于是一个浏览器了吗？无非它不能把HTML渲染出页面而已罢了。

其实HttpClient不是浏览器，它是一个HTTP通信库、一个工具包，因此它只提供一个通用浏览器应用程序所期望的功能子集。HttpClient与浏览器最根本的区别是：HttpClient中没有用户界面，浏览器需要一个渲染引擎来显示页面，并解释用户输入（例如鼠标点击显示页面上的某处之后如何响应、计算如何显示HTML页面、级联样式表和图像、javascript解释器运行嵌入HTML页面或从HTML页面引用的javascript代码、来自用户界面的事件被传递到javascript解释器进行处理等等等等）。HttpClient只能以编程的方式通过其API用于传输和接受HTTP消息，它对内容也是完全不可知的。

提到HttpClient，就不得不提jdk原生的URL了。

jdk中自带了基本的网络编程，也就是java.net包下的一系列API。通过这些API，也可以完成网络编程和访问。

此外，另一个开源项目jsoup，它是一个简单的HTML解析器，可以直接解析指定URL请求地址的内容，它可以通过DOM方式来取数据，也是比较方便的API。

那既然已经有这些工具了，为什么还是有好多好多使用HttpClient的呢？

这里其实是有一个错误的认识：Jsoup是解析器不假，但它跟HttpClient不是同类产品（类似Hibernate和MyBatis），实际上日常使用通常会用HttpClient配合Jsoup做网页爬虫。

HttpClient还是有很多好的特点（摘自Apache HttpClient官网）：

·基于标准、纯净的java语言。实现了HTTP1.0和HTTP1.1；

·以可扩展的面向对象的结构实现了HTTP全部的方法（GET, POST等7种方法）；

·支持HTTPS协议；

·通过HTTP代理建立透明的连接；

·利用CONNECT方法通过HTTP代理建立隧道的HTTPS连接；

·Basic, Digest, NTLMv1, NTLMv2, NTLM2 Session, SNPNEGO/Kerberos认证方案；

·插件式的自定义认证方案；

·便携可靠的套接字工厂使它更容易的使用第三方解决方案；

·连接管理器支持多线程应用；支持设置最大连接数，同时支持设置每个主机的最大连接数，发现并关闭过期的连接；

·自动处理Set-Cookie中的Cookie；

·插件式的自定义Cookie策略；

·Request的输出流可以避免流中内容直接缓冲到socket服务器；

·Response的输入流可以有效的从socket服务器直接读取相应内容；

·在HTTP1.0和HTTP1.1中利用KeepAlive保持持久连接；

·直接获取服务器发送的response code和 headers；

·设置连接超时的能力；

·实验性的支持HTTP1.1 response caching；

·源代码基于Apache License 可免费获取。

正如你所想，上面的需求全部都可以使用HttpClient完成。

·HttpClient的功能包括但不限于：

·模拟浏览器发送HTTP请求，并接收响应

·RPC接口调用

·爬取网页源码

·批量事务请求

·…………

说的HttpClient那么好，它究竟怎么用呢？

搭建Maven工程，需要导入HttpClient的相关jar包。

注意有两个HttpClient的工程，都导入，因为这是两个不同的项目，而我们在下面的用例中都会用到。

一个是单独的HttpClient，另一个是commons的HttpClient，不要搞混了哦！

注：下述没有标注commons的HttpClient都是通常讲的HttpClient，只有标注了commons-HttpClient，那才是工具包下的HttpClient哦（有点绕。。。）

（为了方便后续的几个需求，事先导入了Apache的commons相关工具包，jsoup解析器，和fastjson）

dependencies

dependency

groupIdorg.apache.httpcomponents/groupId

artifactIdhttpclient/artifactId

version4.5.6/version

/dependency

dependency

groupIdcommons-httpclient/groupId

artifactIdcommons-httpclient/artifactId

version3.1/version

/dependency

dependency

groupIdorg.apache.commons/groupId

artifactIdcommons-lang3/artifactId

version3.7/version

/dependency

dependency

groupIdorg.apache.commons/groupId

artifactIdcommons-collections4/artifactId

version4.2/version

/dependency

dependency

groupIdcommons-io/groupId

artifactIdcommons-io/artifactId

version2.6/version

/dependency

dependency

groupIdorg.jsoup/groupId

artifactIdjsoup/artifactId

version1.11.3/version

/dependency

dependency

groupIdcom.alibaba/groupId

artifactIdfastjson/artifactId

version1.2.49/version

/dependency

/dependencies

至于具体的使用，我们来实现一下上面的三个需求吧！

我们使用淘宝网提供的手机归属地查询接口来进行接口对接：

https://tcc.taobao.com/cc/json/mobile_tel_segment.htm?tel=手机号

首先，我们很明显可以看出这是使用HTTP的get请求。

之后我们来编写源码进行接口对接。

publicclassRpcConsumer{

publicstaticvoidmain(String[]args)throwsException{

//1.创建HttpClient对象

CloseableHttpClientclient=HttpClients.createDefault();

//2.声明要请求的url，并构造HttpGet请求

Stringurl=https://tcc.taobao.com/cc/json/mobile_tel_segment.htm?tel=13999999999;

HttpGetget=newHttpGet(url);

//3.让HttpClient去发送get请求，得到响应

CloseableHttpResponseresponse=client.execute(get);

//4.提取响应正文，并打印到控制台

InputStreamis=response.getEntity().getContent();

Stringret=IOUtils.toString(is,GBK);

System.out.println(ret);

}

难度还是比较小的，但是我们在实际开发中绝对不能这么写，url和参数全被写死了，那你估计也要被打死了（滑稽）。接下来，我们来把这个调用者改为工具类。

首先，作为工具类，我们要动态接收url和参数，而不是在代码中写死。

构造RpcHttpUtil类，并从中封装invokeHttp方法如下：

publicclassRpcHttpUtil{

publicstaticfinalStringGET=GET;

publicstaticfinalStringPOST=POST;

publicstaticMapString,StringinvokeHttp(Stringurl,Stringmethod,

MapString,StringparamMap,ListStringreturnParamList)throwsUnsupportedOperationException,IOException{

//1.创建HttpClient对象和响应对象

CloseableHttpClientclient=HttpClients.createDefault();

CloseableHttpResponseresponse=null;

//2.判断请求方法是get还是post

if(StringUtils.equalsIgnoreCase(method,GET)){

//2.1如果是get请求，要拼接请求url的参数

StringBuilderurlSb=newStringBuilder(url);

intparamIndex=0;

for(EntryString,Stringentry:paramMap.entrySet()){

//get请求要追加参数，中间有一个?

if(paramIndex==0){

urlSb.append(?);

}

//拼接参数

urlSb.append(entry.getKey()+=+entry.getValue()+);

}

//前面在拼接参数时最后多了一个，应去掉

urlSb.delete(urlSb.length()-1,urlSb.length());

HttpGetget=newHttpGet(urlSb.toString());

//2.2让HttpClient去发送get请求，得到响应

response=client.execute(get);

}elseif(StringUtils.equalsIgnoreCase(method,POST)){

HttpPostpost=newHttpPost(url);

//2.3如果是post请求，要构造虚拟表单，并封装参数

ListNameValuePairparamList=newArrayList();

for(EntryString,Stringentry:paramMap.entrySet()){

paramList.add(newBasicNameValuePair(entry.getKey(),entry.getValue()));

}

//2.4设置请求正文的编码

UrlEncodedFormEntityuefEntity=newUrlEncodedFormEntity(paramList,GBK);

post.setEntity(uefEntity);

//2.5让HttpClient去发送post请求，得到响应

response=client.execute(post);

}else{

//其他请求类型不支持

thrownewRuntimeException(对不起，该请求方式不支持！);

}

//3.提取响应正文，并封装成Map

InputStreamis=response.getEntity().getContent();

MapString,StringreturnMap=newLinkedHashMap();

Stringret=IOUtils.toString(is,GBK);

//循环正则表达式匹配（因为有多个参数，无法预处理Pattern）

for(Stringparam:returnParamList){

//处理有的json中存在单引号，但有的不存在

Patternpattern=Pattern.compile(param+:'?.+'?);

Matchermatcher=pattern.matcher(ret);

while(matcher.find()){

StringkeyAndValue=matcher.group();

Stringvalue=keyAndValue.substring(keyAndValue.indexOf(')+1,keyAndValue.lastIndexOf('));

returnMap.put(param,value);

}

//如果没有匹配到，则put进空串(jdk8的方法)

returnMap.putIfAbsent(param,);

}

returnreturnMap;

}

privateRpcHttpUtil(){

}

之后测试方法：

publicclassRpcConsumer{

publicstaticvoidmain(String[]args)throwsException{

//初始化参数列表和返回值取值列表

MapString,StringparamMap=newLinkedHashMapString,String(){{

put(tel,13999999999);

}};

ListStringreturnParamList=newArrayListString(){{

add(province);

}};

//调用工具类

MapString,Stringret=RpcHttpUtil.invokeHttp(

https://tcc.taobao.com/cc/json/mobile_tel_segment.htm,

RpcHttpUtil.GET,paramMap,returnParamList);

System.out.println(ret);

//运行结果：{province=新疆}

}

使用commons-HttpClient，配合SOAP协议，可以实现不生成本地源码的前提下，也能调用WebService服务。

我们说，WebService是基于SOAP协议的，我们使用本地源码发送的请求，其实也就是这些基于SOAP的POST请求，收到的响应也是基于SOAP的响应。

那么，如果我们自己构造基于SOAP协议的POST请求，是不是服务也就可以正常返回结果呢？当然是肯定的！

不过，唯一不太好的是：自行构造源码，获得响应后需要自行解析响应体。

接下来我们要先了解SOAP的xml请求体格式，然后才能使用commons-HttpClient进行WebService的请求。

soap:Envelopexmlns:soap=http://schemas.xmlsoap.org/soap/envelope/

soap:Body

[method]xmlns=[namaspace]

[args][text]/[args]

/[method]

/soap:Body

/soap:Envelope

上面的格式中，方括号内的标识为具体WebService的请求。

举个简单的栗子吧：

url为ws.webxml.com.cn/webservices…

里面的namespace要从wsdl中找：

之后构造请求xml（精简）：

soap:Envelopexmlns:soap=http://schemas.xmlsoap.org/soap/envelope/

soap:Body

getRandomCodexmlns=http://WebXml.com.cn/

mobileCode1399999999/mobileCode

userID/userID

/getRandomCode

/soap:Body

/soap:Envelope

publicclassApp{

publicstaticvoidmain(String[]args)throwsException{

Stringurl=http://ws.webxml.com.cn/webservices/qqOnlineWebService.asmx?wsdl;

StringBuildersb=newStringBuilder();

sb.append(soap:Envelopexmlns:soap=\http://schemas.xmlsoap.org/soap/envelope/\);

sb.append(soap:Body);

sb.append(qqCheckOnlinexmlns=\http://WebXml.com.cn/\);

sb.append(qqCode10000/qqCode);

sb.append(/qqCheckOnline);

sb.append(/soap:Body);

sb.append(/soap:Envelope);

PostMethodpostMethod=newPostMethod(url);

byte[]bytes=sb.toString().getBytes(utf-8);

InputStreaminputStream=newByteArrayInputStream(bytes,0,bytes.length);

RequestEntityrequestEntity=newInputStreamRequestEntity(inputStream,bytes.length,text/xml;charset=UTF-8);

postMethod.setRequestEntity(requestEntity);

HttpClienthttpClient=newHttpClient();

httpClient.executeMethod(postMethod);

StringsoapResponseData=postMethod.getResponseBodyAsString();

System.out.println(soapResponseData);

}

请求结果（响应体真的没有换行符号，直接一行出来了。。。）：

?xmlversion=1.0encoding=utf-8?soap:Envelopexmlns:soap=http://schemas.xmlsoap.org/soap/envelope/xmlns:xsi=http://www.w3.org/2001/XMLSchema-instancexmlns:xsd=http://www.w3.org/2001/XMLSchemasoap:BodyqqCheckOnlineResponsexmlns=http://WebXml.com.cn/qqCheckOnlineResultN/qqCheckOnlineResult/qqCheckOnlineResponse/soap:Body/soap:Envelope

我们完全可以使用Dom4j来提取响应体的数据，但是Dom4j只能一层一层的扒，太费劲。Jsoup不仅仅可以解析HTML文档，也可以进行xml转换和提取。

之后向刚才的源码追加如下内容，便可以只输出想要的返回结果。

Documentdocument=Jsoup.parse(soapResponseData);

Stringtext=document.getElementsByTag(qqCheckOnlineResult).text();

System.out.println(text);

//输出结果：N

前边我们说过，HttpClient配合Jsoup可以完成网络爬虫的任务，接下来我们来实际做一个爬虫：爬取京东商城-笔记本电脑的商品信息。

京东商城-笔记本电脑-商品列表页url：https://list.jd.com/list.html?cat=670,671,672

我们要爬取的位置在这里：

所有的商品，构成一个ul，每一个商品都是一个li：

可以看出，一个a标签中嵌套进了这个商品的图片，我们只需要提取这个a标签的链接即可。

编写爬虫程序如下：

publicclassCrawler{

publicstaticvoidmain(String[]args)throwsException{

CloseableHttpClientclient=HttpClients.createDefault();

HttpGetget=newHttpGet(https://list.jd.com/list.html?cat=670,671,672);

CloseableHttpResponseresponse=client.execute(get);

Stringhtml=IOUtils.toString(response.getEntity().getContent(),UTF-8);

Documentdocument=Jsoup.parse(html);

ElementsgoodsDivs=document.getElementsByClass(j-sku-item);

for(ElementgoodsDiv:goodsDivs){

Stringhref=https:+goodsDiv.getElementsByClass(p-img).get(0)

.getElementsByTag(a).get(0).attr(href);

System.out.println(href);

}

可爬取商品链接如下：

之后遍历这些连接，依次进入：

本次我们不做太难的数据处理，只爬取商品名、商品价格以及商品的基本参数。

打开item.jd.com/7418428.htm…，可以提取到相关数据如下：

编写爬虫程序如下：

publicclassCrawler2{

publicstaticvoidmain(String[]args)throwsException{

StringgoodsId=7418428;

CloseableHttpClientclient=HttpClients.createDefault();

HttpGetget=newHttpGet(https://item.jd.com/+goodsId+.html);

CloseableHttpResponseresponse=client.execute(get);

Stringhtml=IOUtils.toString(response.getEntity().getContent(),GBK);

Documentdocument=Jsoup.parse(html);

StringgoodsName=document.getElementsByClass(sku-name).get(0).text();

System.out.println(goodsName);

StringgoodsPrice=document.getElementsByClass(priceJ-p-+goodsId).get(0).text();

System.out.println(goodsPrice);

ElementparamList=document.getElementsByClass(p-parameter).get(0)

.getElementsByClass(parameter2).get(0);

Elementsparams=paramList.getElementsByTag(li);

for(Elementparam:params){

System.out.println(param.attr(title)+-+param.text());

}

爬取结果：

价格没有拿到！说明价格不在我们当前的页面请求上，而是ajax请求获取到的！

需要再用HttpClient请求一次获取价格的链接，才可以正常获取商品价格。

加入修改后的商品价格请求的爬虫源码如下：

publicclassCrawler2{

publicstaticvoidmain(String[]args)throwsException{

StringgoodsId=7418428;

CloseableHttpClientclient=HttpClients.createDefault();

HttpGetget=newHttpGet(https://item.jd.com/+goodsId+.html);

CloseableHttpResponseresponse=client.execute(get);

Stringhtml=IOUtils.toString(response.getEntity().getContent(),GBK);

Documentdocument=Jsoup.parse(html);

//取商品名

StringgoodsName=document.getElementsByClass(sku-name).get(0).text();

System.out.println(goodsName);

//取商品价格

//StringgoodsPrice=document.getElementsByClass(priceJ-p-+goodsId).get(0).text();

//System.out.println(goodsPrice);

//价格属于ajax请求，需要单独发送一个请求，获取价格（此链接返回json数组且长度为1）

StringpriceUrl=http://p.3.cn/prices/get?type=1skuid=J_+goodsId;

HttpPostpost=newHttpPost(priceUrl);

CloseableHttpResponsepriceResponse=client.execute(post);

StringjsonStr=IOUtils.toString(priceResponse.getEntity().getContent(),UTF-8);

JSONObjectjson=JSONArray.parseArray(jsonStr).getJSONObject(0);

System.out.println(json.getString(p));

//加载商品详情

ElementparamList=document.getElementsByClass(p-parameter).get(0)

.getElementsByClass(parameter2).get(0);

Elementsparams=paramList.getElementsByTag(li);

for(Elementparam:params){

System.out.println(param.attr(title)+-+param.text());

}

运行。可以正常获取价格：