目标
获取
http://segmentfault.com/blogs/recommend?page=3页面的文章列表postList根据
postList,逐个获取具体的文章页面用文章标题作为文件名,如:
{{ title }}.txt;将文章内容储存在txt文件中
工具或库的选择
superagent可以便捷的发送网络请求,并得到其响应的结果cheerio让我们可以用熟悉的jQuery风格处理html字符串observe.js可以侦听对象的属性,被侦听的属性的值发生改变时,会自动调用指定回调函数,方便运用观察者模式
准备工作
npm install superagent
npm install cheerio
npm install observe.js
app.js,引入模块:var superagent = require('superagent')
var observe = require('observe.js')
var cheerio = require('cheerio')
var path = require('path')
var url = require('url')
var fs = require('fs')具体实现
postList,用来储存txt文件if (!fs.existsSync('postList')) {
fs.mkdirSync('postList')
}var cwd = process.cwd()var reptile = observe({})reptile.on({
url: function(url) {
var that = this
superagent
.get(url)
.query(this.query)
.end(function(res) {
if (res.ok) {
that.text = res.text
}
})
},
text: function(text) {
var that = this
var $ = cheerio.load(text)
var postList = []
$('h2.title a').each(function() {
postList.push({
title: $(this).text(),
url: path.join(url.parse(that.url).hostname, $(this).attr('href'))
})
})
this.postList = postList
this.postItem = postList.shift()
},
postItem: function(postItem) {
console.log(postItem.url)
var that = this
superagent
.get(postItem.url)
.end(function(res) {
if (res.ok) {
that.content = {
filename: path.join(cwd, 'postList', postItem.title + '.txt'),
title: postItem.title,
text: res.text
}
} else {
console.log(res)
}
})
},
content: function(content) {
var that = this
var $ = cheerio.load(content.text)
var data = ''
$('.article *').each(function() {
data += $(this).text() + '\n'
})
fs.writeFile(content.filename, data, function(err) {
if (err) {
console.log(err)
} else if (that.postList.length) {
that.postItem = that.postList.shift()
}
})
}
})reptile.url = 'http://segmentfault.com/blogs/recommend'
reptile.query = 'page=3'以上,全部逻辑都写完.
运行app.js
在当前目录打开命令行,window系统下快捷方式为:按住shift键,点击鼠标右键,菜单栏会多出在此处打开命令窗口
node app.js
等待结果,查看postList目录下有无新增txt文件
本文转自:前端乱炖
本文作者:Lucifier