GOES爬虫框架

153 阅读1分钟
原文链接: github.com

Overview

GOES is a crawler of vertical communities achieved by Go. It can be used to crawl websites and extract structured data.

Features

  • Flexible & Modular
  • Can be easy to expand
  • Native Go

Requirements

  • Go 1.5 or higher

Example

newQueue := &queue.NewQueue{}
newQueue.Push("www.example.com")
scheduler.Schedule(newPageQueue, oldPageQueue, func(doc *goquery.Document,toStop chan bool){
	// use doc to query the html document
	// visit goquery to learn how to use it
})

Vist goquery to learn how to use it to query html document

Installation

go get go get github.com/PuerkitoBio/goquery
go get github.com/ruinstang/goes.git

This project is based on goquery.