Python Challenge 第4关

58 阅读1分钟

www.pythonchallenge.com/pc/def/link…

4

第4关:这一关才开始推荐使用urllib,无妨。打开源码发现注释:

urllib may help. DON’T TRY ALL NOTHINGS, since it will never end. 400 times is more than enough.

点击图片发现可以跳转,url=http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=12345,替换12345为页面中next nothing的值:44827,然后多替换几次,就可以发现规律,一点点消除URL的异常。为了节省时间,每次开始可以从抛异常的上一个数开始执行,只需要替换url中的nothing值即可。

第一次:Yes. Divide by two and keep going.,把上次值除以2.
第二次:There maybe misleading numbers in the text. One example is 82683. Look only for the next nothing and the next nothing is 63579
第三次:peak.html,这就是下一关的URL。

  1. 获取源码

    import urllib.request
    
    def get_html_page(url):
        page = None
        resp = urllib.request.urlopen(url)
        if (resp.status == 200):
            page = resp.read().decode('utf-8')
        return page
    
  2. 获取注释

    def get_comments(page):
        rs = re.findall('<!--\s*(.*?)\s*-->', page, re.S)
        print(rs)
        if rs:
            return rs[0] # 注意这里是0
        return None
    
  3. 主函数

    def main():
        url = 'http://www.pythonchallenge.com/pc/def/equality.html'
    
        page = get_html_page(url)
        comments = get_comments(page)
        print(comments)
        rs = re.findall('[^A-Z][A-Z]{3}([a-z])[A-Z]{3}[^A-Z]', comments, re.S)
        print("".join(rs))
    

下一关URL:www.pythonchallenge.com/pc/def/peak…