【python】undetected_chromedriver 与代理共同使用过程中的坑

72 阅读4分钟

系统无代理

代码

import undetected_chromedriver as uc
options = uc.ChromeOption()
driver = uc.Chrome(version_main = 113, options = options)

报错

>>> d = uc.Chrome(version_main = 105, options = o)
Traceback (most recent call last):
  File "/usr/lib64/python3.11/urllib/request.py", line 1348, in do_open
    h.request(req.get_method(), req.selector, req.data, headers,
  File "/usr/lib64/python3.11/http/client.py", line 1286, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/lib64/python3.11/http/client.py", line 1332, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/lib64/python3.11/http/client.py", line 1281, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib64/python3.11/http/client.py", line 1041, in _send_output
    self.send(msg)
  File "/usr/lib64/python3.11/http/client.py", line 979, in send
    self.connect()
  File "/usr/lib64/python3.11/http/client.py", line 1458, in connect
    self.sock = self._context.wrap_socket(self.sock,
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/ssl.py", line 517, in wrap_socket
    return self.sslsocket_class._create(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/ssl.py", line 1108, in _create
    self.do_handshake()
  File "/usr/lib64/python3.11/ssl.py", line 1379, in do_handshake
    self._sslobj.do_handshake()
ConnectionResetError: [Errno 104] Connection reset by peer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.11/site-packages/undetected_chromedriver/__init__.py", line 258, in __init__
    self.patcher.auto()
  File "/usr/local/lib/python3.11/site-packages/undetected_chromedriver/patcher.py", line 175, in auto
    release = self.fetch_release_number()
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/undetected_chromedriver/patcher.py", line 243, in fetch_release_number
    return LooseVersion(urlopen(self.url_repo + path).read().decode())
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/urllib/request.py", line 216, in urlopen
    return opener.open(url, data, timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/urllib/request.py", line 519, in open
    response = self._open(req, data)
               ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/urllib/request.py", line 536, in _open
    result = self._call_chain(self.handle_open, protocol, protocol +
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/urllib/request.py", line 496, in _call_chain
    result = func(*args)
             ^^^^^^^^^^^
  File "/usr/lib64/python3.11/urllib/request.py", line 1391, in https_open
    return self.do_open(http.client.HTTPSConnection, req,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/urllib/request.py", line 1351, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [Errno 104] Connection reset by peer>

报错内容很明显:urlopen [Errno 104] Connection reset by peer

顺着第二组 traceback 逐步分析发现,uc 在创建 Chrome 实例的时候,期望获取到 chromedriver 的发布版本号(fetch_release_number),而在此过程中,由于系统安装的 chromedriver 的版本小于等于 114,所以,被认为是 is_old_chromedriver,将访问 url_repo = "https://chromedriver.storage.googleapis.com" 来获取具体的版本号。而这个网站的访问,可能由于某些原因在某些时间或条件下,需要使用代理才能访问。

而 uc 在创建 Chrome 实例的时候,尚未对 options 参数有任何处理。换句话说,即使使用了 options.add_argument('--proxy-server=protocol://ip:port') 来设置代理,也无济于事。

继续深究发现,uc 在尝试访问 chromedriver 版本数据库时使用的 api 是来自 urllib.requesturlopen。所以,我们需要让这个 api 能使用代理访问网站。

把问题扔给 ChatGPT,很快得到了解决方案:

import urllib.request
import undetected_chromedriver as uc

proxyHandler = urllib.request.ProxyHandler({
    'http': 'http://xxx.sss.com:22222',
    'https': 'http://xxx.sss.com:22222'
})
opener = urllib.request.build_opener(proxyHandler)
urllib.request.install_opener(opener)

options = uc.ChromeOptions()
driver = uc.Chrome(version_main = 113, options = options)

或者,有可能你在 uc.Chrome 的时候没有传入 version_main,它将报以下错,但错误原因本质上是一样的。按上述方法处理即可。

Traceback (most recent call last):
  File "/usr/lib64/python3.11/urllib/request.py", line 1348, in do_open
    h.request(req.get_method(), req.selector, req.data, headers,
  File "/usr/lib64/python3.11/http/client.py", line 1286, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/lib64/python3.11/http/client.py", line 1332, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/lib64/python3.11/http/client.py", line 1281, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib64/python3.11/http/client.py", line 1041, in _send_output
    self.send(msg)
  File "/usr/lib64/python3.11/http/client.py", line 979, in send
    self.connect()
  File "/usr/lib64/python3.11/http/client.py", line 1458, in connect
    self.sock = self._context.wrap_socket(self.sock,
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/ssl.py", line 517, in wrap_socket
    return self.sslsocket_class._create(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/ssl.py", line 1108, in _create
    self.do_handshake()
  File "/usr/lib64/python3.11/ssl.py", line 1379, in do_handshake
    self._sslobj.do_handshake()
ConnectionResetError: [Errno 104] Connection reset by peer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.11/site-packages/undetected_chromedriver/__init__.py", line 258, in __init__
    self.patcher.auto()
  File "/usr/local/lib/python3.11/site-packages/undetected_chromedriver/patcher.py", line 178, in auto
    self.unzip_package(self.fetch_package())
                       ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/undetected_chromedriver/patcher.py", line 287, in fetch_package
    return urlretrieve(download_url)[0]
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/urllib/request.py", line 241, in urlretrieve
    with contextlib.closing(urlopen(url, data)) as fp:
                            ^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/urllib/request.py", line 216, in urlopen
    return opener.open(url, data, timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/urllib/request.py", line 525, in open
    response = meth(req, response)
               ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/urllib/request.py", line 634, in http_response
    response = self.parent.error(
               ^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/urllib/request.py", line 557, in error
    result = self._call_chain(*args)
             ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/urllib/request.py", line 496, in _call_chain
    result = func(*args)
             ^^^^^^^^^^^
  File "/usr/lib64/python3.11/urllib/request.py", line 749, in http_error_302
    return self.parent.open(new, timeout=req.timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/urllib/request.py", line 519, in open
    response = self._open(req, data)
               ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/urllib/request.py", line 536, in _open
    result = self._call_chain(self.handle_open, protocol, protocol +
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/urllib/request.py", line 496, in _call_chain
    result = func(*args)
             ^^^^^^^^^^^
  File "/usr/lib64/python3.11/urllib/request.py", line 1391, in https_open
    return self.do_open(http.client.HTTPSConnection, req,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/urllib/request.py", line 1351, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [Errno 104] Connection reset by peer>

系统有代理

代码

import undetected_chromedriver as uc
options = uc.ChromeOption()
driver = uc.Chrome(version_main = 113, options = options)

报错

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/liuyike/.local/lib/python3.10/site-packages/undetected_chromedriver/__init__.py", line 466, in __init__
    super(Chrome, self).__init__(
  File "/home/liuyike/.local/lib/python3.10/site-packages/selenium/webdriver/chrome/webdriver.py", line 45, in __init__
    super().__init__(
  File "/home/liuyike/.local/lib/python3.10/site-packages/selenium/webdriver/chromium/webdriver.py", line 61, in __init__
    super().__init__(command_executor=executor, options=options)
  File "/home/liuyike/.local/lib/python3.10/site-packages/selenium/webdriver/remote/webdriver.py", line 209, in __init__
    self.start_session(capabilities)
  File "/home/liuyike/.local/lib/python3.10/site-packages/undetected_chromedriver/__init__.py", line 724, in start_session
    super(selenium.webdriver.chrome.webdriver.WebDriver, self).start_session(
  File "/home/liuyike/.local/lib/python3.10/site-packages/selenium/webdriver/remote/webdriver.py", line 293, in start_session
    response = self.execute(Command.NEW_SESSION, caps)["value"]
  File "/home/liuyike/.local/lib/python3.10/site-packages/selenium/webdriver/remote/webdriver.py", line 348, in execute
    self.error_handler.check_response(response)
  File "/home/liuyike/.local/lib/python3.10/site-packages/selenium/webdriver/remote/errorhandler.py", line 193, in check_response
    raise exception_class(value)
selenium.common.exceptions.WebDriverException: Message: 

具体原因未深究。但这种报错的产生原因是因为,系统设置了代理,只需要取消系统代理即可。 示例代码:

import os
# save current proxy settings.
curHttpProxy = os.environ.get('http_proxy')
curHttpsProxy = os.environ.get('https_proxy')

# remove proxy settings
os.environ.pop('http_proxy', None)
os.environ.pop('https_proxy', None)

# Codes about undetected_chromedriver.

# Reset current http and https proxy.
os.environ['http_proxy'] = curHttpProxy
os.environ['https_proxy'] = curHttpsProxy