Python Challenge 第4关http://www.pythonchallenge.com/pc/def/li

www.pythonchallenge.com/pc/def/link…

第4关：这一关才开始推荐使用urllib，无妨。打开源码发现注释：

urllib may help. DON’T TRY ALL NOTHINGS, since it will never end. 400 times is more than enough.

点击图片发现可以跳转，url=http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=12345，替换12345为页面中next nothing的值：44827，然后多替换几次，就可以发现规律，一点点消除URL的异常。为了节省时间，每次开始可以从抛异常的上一个数开始执行，只需要替换url中的nothing值即可。

第一次：Yes. Divide by two and keep going.，把上次值除以2.
第二次：There maybe misleading numbers in the text. One example is 82683. Look only for the next nothing and the next nothing is 63579
第三次：peak.html，这就是下一关的URL。

获取源码

import urllib.request

def get_html_page(url):
    page = None
    resp = urllib.request.urlopen(url)
    if (resp.status == 200):
        page = resp.read().decode('utf-8')
    return page

获取注释

def get_comments(page):
    rs = re.findall('<!--\s*(.*?)\s*-->', page, re.S)
    print(rs)
    if rs:
        return rs[0] # 注意这里是0
    return None

主函数

def main():
    url = 'http://www.pythonchallenge.com/pc/def/equality.html'

    page = get_html_page(url)
    comments = get_comments(page)
    print(comments)
    rs = re.findall('[^A-Z][A-Z]{3}([a-z])[A-Z]{3}[^A-Z]', comments, re.S)
    print("".join(rs))

下一关URL：www.pythonchallenge.com/pc/def/peak…