python自动化神器专题4：抓取 stack overflow 问题列表最近在关注 stackoverflow 上的某

携手创作，共同成长！这是我参与「掘金日新计划 · 8 月更文挑战」的第4天，点击查看活动详情

最近在持续专注一个新的python自动化开发工具，等这个系列写完，也是敦促自己研究完，可以写一些自动化库或者开发工具的对比，目前市面上好用的，全面的自动化开发工具还不太多。

最近在关注 stackoverflow 上的某一类问题，所以想用程序帮我做监控，自动打开浏览器，搜索指定关键词，然后把前 30 个问题列表保存或者发给我。源代码可以从github获取，以下是我的开发过程.

系统环境

windows 10
python 3.0
clicknium

安装 clicknium vscode 扩展和 python module ，参照getting started.

开发思路

自动打开浏览器，返回 tab 对象

tab = cc.edge.open("www.stackoverflow.com")

输入关键字，发送{ENTER}快捷键进行搜索

tab.find_element(locator.stackoverflow.text_q).set_text(word)
tab.find_element(locator.stackoverflow.text_q).send_hotkey('{ENTER}')

搜索之前，可能会需要进行人机验证，用如下代码进行判断和点击

elem = tab.wait_appear(locator.stackoverflow.human_verification_div, wait_timeout=5)
if elem != None:
    elem.click()

点击'Newest',根据时间来排序
利用 clicknium 的获取相似元素，获取每个问题的标题，vote 数量，内容，最后更新时间，以及问题的 url

 while catch_count < 30:
        sleep(1)
        elems_title = tab.find_elements(locator.stackoverflow.a_title)
        elems_vote = tab.find_elements(locator.stackoverflow.span_vote)
        elems_content = tab.find_elements(locator.stackoverflow.div_content)
        elems_time = tab.find_elements(locator.stackoverflow.span_time)
 for i in range(len(elems_title)):
            url = "https://www.stackoverflow.com" + elems_title[i].get_property('href')
            item = {
 'Keyword':word, 
 'Title': elems_title[i].get_text(), 
 'Content': elems_content[i].get_text(),
 'Time': elems_time[i].get_text(),
 'Vote': elems_vote[i].get_text(),
 'Url':url}
 print(item)
            catch_count += 1
 if tab.is_existing(locator.stackoverflow.a_next):
            tab.find_element(locator.stackoverflow.a_next).click()
 else:
 break

以下是问题标题链接的 locator locator1

点击'Validate'是可以验证能匹配到单页 15 个元素的，通过find_elements1可以一次性获取到所有的元素列表，然后通过get_text()获取文本，针对链接，还可以通过get_property('href') 来获取属性 href 。

非常简单，半个小时收工。