Python - Selenium 爬虫翻页代码代码示例下面是 Selenium 爬虫翻页代码，已改进并添加注释，适用

代码示例

下面是 Selenium 爬虫翻页代码，已改进并添加注释，适用于百度搜索翻页的场景：


from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.edge.service import Service
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time

# _*_coding : utf-8 _*_
# @Time : 2024/11/14 11:48
# @Author : af
# @File : cl_selenium_交互
# @Project : python_crawler

# 指定chromedriver路径
path = 'msedgedriver.exe'
service = Service(path)
driver = webdriver.Edge(service=service)

# 打开百度主页
driver.get('https://www.baidu.com')

# 输入搜索关键词
search_box = driver.find_element(By.ID, 'kw')
search_box.send_keys('周杰伦')

# 点击搜索按钮
search_button = driver.find_element(By.ID, 'su')
search_button.click()

# 翻页操作
while True:
    try:
        # 滑动页面到底部，确保加载更多内容
        js_bottom = 'document.documentElement.scrollTop=100000'
        driver.execute_script(js_bottom)
        time.sleep(2)  # 等待页面加载

        # 等待并点击“下一页”按钮
        next_button = WebDriverWait(driver, 10).until(
            EC.element_to_be_clickable((By.LINK_TEXT, '下一页 >'))
        )

        # 检查按钮文本是否包含“下一页”，判断是否为最后一页
        if "下一页 >" not in next_button.text:
            print("没有更多页面")
            break

        # 点击“下一页”按钮
        next_button.click()
        time.sleep(4)  # 等待页面加载

    except Exception as e:
        print("发生错误，退出循环", e)
        break

# 关闭浏览器
driver.quit()

优化点：

改进滑动页面逻辑：通过执行 JavaScript 滑动页面到底部，模拟用户滑动操作，确保页面内容完全加载。
等待页面加载：使用 WebDriverWait 来确保页面元素加载完毕再执行操作，避免错误。
翻页检测：通过检查“下一页”按钮的文本，判断是否为最后一页，若是则停止翻页。
异常处理：加入 try-except 语句，捕获可能的异常，避免因错误导致程序中断。

适用场景：

在百度搜索结果页面进行翻页爬取时使用，可以轻松调整适应其他分页网站。