记一次Dokcer启动失败原因排查步骤

1,795 阅读5分钟

当我们对使用很久的docker重启,或因为配置文件更改重启docker,突然docker就不能运行了。

使用“ systemctl status docker”查看docker的状态会看见:

   
× docker.service - Docker Application Container Engine
     Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Mon 2025-01-27 16:00:26 CST; 3s ago
TriggeredBy: × docker.socket
       Docs: https://docs.docker.com
    Process: 6503 ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock (code=exited, status=1/FAILURE)
   Main PID: 6503 (code=exited, status=1/FAILURE)
        CPU: 140ms

1月 27 16:00:24 raptor systemd[1]: docker.service: Main process exited, code=exited, status=1/FAILURE
1月 27 16:00:24 raptor systemd[1]: docker.service: Failed with result 'exit-code'.
1月 27 16:00:24 raptor systemd[1]: Failed to start Docker Application Container Engine.
1月 27 16:00:26 raptor systemd[1]: docker.service: Scheduled restart job, restart counter is at 3.
1月 27 16:00:26 raptor systemd[1]: Stopped Docker Application Container Engine.
1月 27 16:00:26 raptor systemd[1]: docker.service: Start request repeated too quickly.
1月 27 16:00:26 raptor systemd[1]: docker.service: Failed with result 'exit-code'.
1月 27 16:00:26 raptor systemd[1]: Failed to start Docker Application Container Engine.

上面的信息是非常有限的,我们并不能从中得到具体是什么原因导致docker不能启动,即使复制这段信息到浏览器查找解决方法找到的解决步骤可能是解决其他原因导致docker不能启动的,不一定可以解决你的问题。我一开始就是照网上的步骤来,发现几个解决方法都不行。这些方法里有的是权限原因、有的是配置文件原因等等,因为我可以肯定我上述原因都没问题。所以在一番探索后

发现可以通过:journalctl -xeu docker.service 查看docker的日志,我们可以在里面找到导致我们docker启动失败的具体原因,才能对症下药。

1月 27 16:02:15 raptor dockerd[7181]: time="2025-01-27T16:02:15.839596173+08:00" level=info msg="Firewalld: interface br-bd5144aee2da already part of docker zone, returning"
1月 27 16:02:15 raptor dockerd[7181]: time="2025-01-27T16:02:15.850894407+08:00" level=info msg="Firewalld: interface br-bd5144aee2da already part of docker zone, returning"
1月 27 16:02:15 raptor dockerd[7181]: time="2025-01-27T16:02:15.928025084+08:00" level=info msg="Firewalld: interface docker0 already part of docker zone, returning"
1月 27 16:02:15 raptor dockerd[7181]: time="2025-01-27T16:02:15.937776180+08:00" level=info msg="Firewalld: interface docker0 already part of docker zone, returning"
1月 27 16:02:15 raptor dockerd[7181]: time="2025-01-27T16:02:15.990738491+08:00" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address"
1月 27 16:02:15 raptor dockerd[7181]: time="2025-01-27T16:02:15.991767393+08:00" level=info msg="stopping event stream following graceful shutdown" error="<nil>" module=libcontainerd namespace=moby
1月 27 16:02:15 raptor dockerd[7181]: failed to start daemon: Error initializing network controller: error creating default "bridge" network: cannot create network f497a50d985b4c37d944c240f3b4326de73790938d10c1df8c9756d948a66745 (docker0): conflicts with network eaea8a2c931306b96b0452e5314b8>
1月 27 16:02:15 raptor systemd[1]: docker.service: Main process exited, code=exited, status=1/FAILURE
░░ Subject: Unit process exited
░░ Defined-By: systemd
░░ Support: http://www.ubuntu.com/support
░░ 
░░ An ExecStart= process belonging to unit docker.service has exited.
░░ 
░░ The process' exit code is 'exited' and its exit status is 1.
1月 27 16:02:15 raptor systemd[1]: docker.service: Failed with result 'exit-code'.
░░ Subject: Unit failed
░░ Defined-By: systemd
░░ Support: http://www.ubuntu.com/support
░░ 
░░ The unit docker.service has entered the 'failed' state with result 'exit-code'.
1月 27 16:02:15 raptor systemd[1]: Failed to start Docker Application Container Engine.
░░ Subject: docker.service 单元已失败
░░ Defined-By: systemd
░░ Support: http://www.ubuntu.com/support
░░ 
░░ docker.service 单元已失败。
░░ 
░░ 结果为“failed”。
1月 27 16:02:18 raptor systemd[1]: docker.service: Scheduled restart job, restart counter is at 3.
░░ Subject: Automatic restarting of a unit has been scheduled
░░ Defined-By: systemd
░░ Support: http://www.ubuntu.com/support
░░ 
░░ Automatic restarting of the unit docker.service has been scheduled, as the result for
░░ the configured Restart= setting for the unit.
1月 27 16:02:18 raptor systemd[1]: Stopped Docker Application Container Engine.
░░ Subject: docker.service 单元已结束停止操作
░░ Defined-By: systemd
░░ Support: http://www.ubuntu.com/support
░░ 
░░ docker.service 单元已结束停止操作。
1月 27 16:02:18 raptor systemd[1]: docker.service: Start request repeated too quickly.
1月 27 16:02:18 raptor systemd[1]: docker.service: Failed with result 'exit-code'.

从日志中可以看出,Docker 启动失败的原因是网络初始化时出现了冲突,具体错误信息如下:

复制

failed to start daemon: Error initializing network controller: error creating default "bridge" network: cannot create network f497a50d985b4c37d944c240f3b4326de73790938d10c1df8c9756d948a66745 (docker0): conflicts with network eaea8a2c931306b96b0452e5314b8...

这表明 Docker 在尝试创建默认的 bridge 网络(docker0)时,发现与现有的网络配置冲突。以下是解决此问题的步骤:


1. 清理 Docker 网络配置

Docker 的网络配置可能残留了旧的或冲突的网络设置,需要清理。

  • 删除 Docker 的网络配置文件:

    sudo rm -rf /var/lib/docker/network
    

    如果你知道是那个网段冲突可以直接操作这个这个文件:sudo rm -rf /var/lib/docker/network/files/local-kv.db

  • 重启 Docker 服务:

    sudo systemctl start docker
    

解决日志报的错后到这里就可以正常启动了,如果还是不行吗,继续看日志的信息解决

image.png

避免上述情况再次发生,我们需要手动配置 Docker 网络

如果默认的 bridge 网络配置冲突,可以尝试手动配置 Docker 的网络。

  • 编辑 Docker 的配置文件 /etc/docker/daemon.json

    sudo nano /etc/docker/daemon.json
    
  • 添加或修改以下内容,指定自定义的网络配置:

    {
      "bip": "192.168.1.1/24",
      "default-address-pools": [
        {
          "base": "192.168.2.0/24",
          "size": 24
        }
      ]
    }