零.http包介绍

golang原生的http包提供了客户端和服务器的实现

在golang包中引用：

import "net/http"

一.常用方法

在爬取网页的过程中，客户端发起http请求获取网页，并解析后获取需要的结果

一般来看过程只有两个：

1.发起http请求

2.解析网页

而http所包含的功能已经足够满足发起请求的需求了

常用方法：Get、Head、Post、PostForm

resp, err := http.Get("http://example.com/")
resp, err := http.Post("http://example.com/upload", "image/jpeg", &buf)
resp, err := http.PostForm("http://example.com/form",
	url.Values{"key": {"Value"}, "id": {"123"}})

使用：

func (h Header) Add(key, value string)

func (h Header) Set(key, value string)

来设置请求头例如发起请求客户端和Content-Type用来通知客户端返回数据的类型

在更加复杂的C/S模型中，用户可自定义客户端实现和服务器实现：

// create a client
client := &http.Client{
	CheckRedirect: redirectPolicyFunc,
}

resp, err := client.Get("http://example.com")

// create a server
s := &http.Server{
	Addr:           ":8080",
	Handler:        myHandler,
	ReadTimeout:    10 * time.Second,
	WriteTimeout:   10 * time.Second,
	MaxHeaderBytes: 1 << 20,
}
log.Fatal(s.ListenAndServe()) // log & exit & ignore defer

二.最佳实践

官方文档

三.网页解析

在我的另一篇文档中使用了goquery进行html页面解析

对于不复杂的爬虫来说，net/http加上goquery已经足够使用了

[Introduction]智能爬虫：net/http

零.http包介绍

一.常用方法

二.最佳实践

三.网页解析