第4关:这一关才开始推荐使用urllib,无妨。打开源码发现注释:
urllib may help. DON’T TRY ALL NOTHINGS, since it will never end. 400 times is more than enough.
点击图片发现可以跳转,url=http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=12345,替换12345为页面中next nothing的值:44827,然后多替换几次,就可以发现规律,一点点消除URL的异常。为了节省时间,每次开始可以从抛异常的上一个数开始执行,只需要替换url中的nothing值即可。
第一次:Yes. Divide by two and keep going.,把上次值除以2.
第二次:There maybe misleading numbers in the text. One example is 82683. Look only for the next nothing and the next nothing is 63579
第三次:peak.html,这就是下一关的URL。
-
获取源码
import urllib.request def get_html_page(url): page = None resp = urllib.request.urlopen(url) if (resp.status == 200): page = resp.read().decode('utf-8') return page -
获取注释
def get_comments(page): rs = re.findall('<!--\s*(.*?)\s*-->', page, re.S) print(rs) if rs: return rs[0] # 注意这里是0 return None -
主函数
def main(): url = 'http://www.pythonchallenge.com/pc/def/equality.html' page = get_html_page(url) comments = get_comments(page) print(comments) rs = re.findall('[^A-Z][A-Z]{3}([a-z])[A-Z]{3}[^A-Z]', comments, re.S) print("".join(rs))