Python Challenge 第 2 关攻略:ocr

647 阅读1分钟

Python Challenge2 关攻略:ocr


题目地址 www.pythonchallenge.com/pc/def/ocr.…


题目内容

recognize the characters. maybe they are in the book, but MAYBE they are in the page source.

General tips:

  • Use the hints. They are helpful, most of the times.
  • Investigate the data given to you.
  • Avoid looking for spoilers.

Forums: Python Challenge Forums, read before you post. IRC: irc.freenode.net #pythonchallenge

To see the solutions to the previous level, replace pc with pcc, i.e. go to: www.pythonchallenge.com/pcc/def/ocr…


题目解法 提示要查看网页源代码。 发现源代码中有提示一串乱码,提示如下: find rare characters in the mess below:

下面先爬取网页,用正则匹配出乱码,然后提取乱码中的字母并打印:

from urllib import request
import re

# 获取html源代码
url = 'http://www.pythonchallenge.com/pc/def/ocr.html'
response = request.urlopen(url)
text = str(response.read())

# 获取乱码字符串
pattern = re.compile(r'<!--(.+?)-->')
result = pattern.findall(text)
result = result[1]
# 去除换行符\n
result = result.replace(r'\n', '')

# 查找字母
characters = re.findall(r'[a-zA-Z]+', result)
msg = ''.join(characters)
print(msg)

得到结果 equality ,老方法修改 URL ,放入浏览器回车: www.pythonchallenge.com/pc/def/equa…