cheerio:node向html中编织内容
const getHTML = (function(){
let resData = fs.readFileSync(path.join(process.cwd(), STATIC_PATH, 'index.html'));
let $ = cheerio.load(resData);
$('body').append('<script>123</script>');
return function(){ return $ };
})()
app.use(function(req, res, next){
let matcher = req.url.match(/\/[0-9a-zA-Z%]+\.html/g);
if(matcher && matcher.length && (matcher = matcher[0])){
let $ = getHTML();
// 只向index.html编织内容
if(matcher.indexOf('index') === 1 ) res.end($.html());
}
next();
})
app.use(express.static(path.join(__dirname, STATIC_PATH)));
为什么会出现输出的内容中,本该在head标签中的内容全都被放入了body中
SCRIPT and LINK tags defined in HEAD incorrectly assigned to BODY #1072
解决办法:
// 将buffer转为string,trim后传给cheerio
cheerio.load(resData.toString().trim());
// 将buffer前边的BOM删掉
cheerio.load(removeBufferBom(resData));
采用第一种方法,因为cheerio.load接收到Buffer,也会toString成字符串再做处理,
而第二种需要建立一个新Buffer对象,浪费内存
Upon closer inspection this only occurs when you have a custom element
preceeding the tags in the head tag
In the example above the link tag will show up but the script tag gets
moved into the body (but not in the body HTML.) This differs from jQuery,
which will always place it in the head tree.
Additional note: it does actually copy the non-standard tags into the
start of the body html (for instance the header tag above,) but doesn't
not copy the other elements into the body.
说的是当head标签中有自定义标签时候,自定义标签后的内容都会被转移到body标签中, 但是检查我们的页面,head标签中并没有自定义标签,而经过转化的html第一个节点却 总是一个文字节点:""
<body>
""
<title>Document</title>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
</body>
经过百度,为什么这个网页代码 head 内的信息会被浏览器理解为在 body 内?,<head> 嵌套错误是因为整个 HTML 文档的最开头多了一个看不见的 U+FEFF(就在「<!DOCTYPE>」前面),它污染了 HTML。
前端无痕埋点
js指纹生成
FingerprintJS