本人文章<=>个人笔记,若有误,望指正,感激不尽.
本人邮箱:silenceandsharp@163.com
文章基于python3
先说获取字节编码:
pip install cchardet
这是一个没人维护的库!有时候会不灵光!
import cchardet
x=cchardet.detect('我是菜鸡'.encode(encoding='GBK'))
print(x)
print(cchardet.detect('为反对三法司'.encode('gbk')))
输出:
{'encoding': None, 'confidence': None}
{'encoding': 'GB18030', 'confidence': 0.9900000095367432}
再说压缩存储:
import zlib
import htmlmin
html = """
<!DOCTYPE html>
<html lang="en">
<head>
<title>Bootstrap Case</title>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/css/bootstrap.min.css">
<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.1.1/jquery.min.js"></script>
<script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/js/bootstrap.min.js"></script>
</head>
<body>
<div class="container">
<h2>Well</h2>
<div class="well">Basic Well</div>
</div>
</body>
</html>
"""
s = 'slfsjdalfkasflkkdkaleeeeeeeeeeeeeeeeeeeeeeeeeeeelaaalkllfksaklfasdll kkkkkk123'
#zlib压缩
zlib_s = zlib.compress(s.encode('utf8'))
# zlib_s = zlib.compress(s.encode('gbk'))
print('s'.ljust(12), len(s))
print('zlib_s'.ljust(12), len(zlib_s))
#对比htmlmin的压缩
zlib_html = zlib.compress(html.encode('utf8'))
print('html'.ljust(12), len(html))
print('zlib_html'.ljust(12), len(zlib_html))
htmlmin_html = htmlmin.minify(html, remove_empty_space=True)
print('htmlmin_html'.ljust(12), len(htmlmin_html))
#zlib还原
ss = zlib.decompress(zlib_s)
print(ss.decode('utf8'))
结论:
1,尽量选择需要的部分来压缩,像<head>这样的存下来也没什么用的,就甭存了(这个示例没写那么透彻)
2,zlib的压缩要比htmlmin的压缩比率要高
3,存html,是为了预防产品或客户变更需求