2h2g服务器,使用的是1panel面板,也就安装了几个应用和一套nest.js前后端服务。
之前服务器就有时候卡死,重启一下就行了。今天点了个升级面板服务器又卡死了,重启好几遍都是两分钟之内就卡死。
第一步操作-停止启动Docker
[root@iZt4n2b717v9iw1h98ug53Z ~]# sudo systemctl mask docker docker.socket containerd
Created symlink /etc/systemd/system/docker.service → /dev/null.
Created symlink /etc/systemd/system/docker.socket → /dev/null.
Created symlink /etc/systemd/system/containerd.service → /dev/null.
[root@iZt4n2b717v9iw1h98ug53Z ~]# sudo reboot
第二步操作-查看状态
[root@iZt4n2b717v9iw1h98ug53Z ~]# free -h
total used free shared buff/cache available
Mem: 1.7Gi 248Mi 690Mi 1.0Mi 796Mi 1.3Gi
Swap: 0B 0B 0B
[root@iZt4n2b717v9iw1h98ug53Z ~]# swapon --show
[root@iZt4n2b717v9iw1h98ug53Z ~]# systemctl status docker
● docker.service
Loaded: masked (Reason: Unit docker.service is masked.)
Active: inactive (dead)
[root@iZt4n2b717v9iw1h98ug53Z ~]#
第三步操作-设置swap
[root@iZt4n2b717v9iw1h98ug53Z ~]# sudo -i
[root@iZt4n2b717v9iw1h98ug53Z ~]# fallocate -l 4G /swapfile
[root@iZt4n2b717v9iw1h98ug53Z ~]# chmod 600 /swapfile
[root@iZt4n2b717v9iw1h98ug53Z ~]# mkswap /swapfile
Setting up swapspace version 1, size = 4 GiB (4294963200 bytes)
no label, UUID=f876fc98-503d-4329-8e3d-04aa5ea13c81
[root@iZt4n2b717v9iw1h98ug53Z ~]# swapon /swapfile
[root@iZt4n2b717v9iw1h98ug53Z ~]# echo '/swapfile none swap sw 0 0' >> /etc/fstab
[root@iZt4n2b717v9iw1h98ug53Z ~]# echo 'vm.swappiness=10' > /etc/sysctl.d/99-swap.conf
[root@iZt4n2b717v9iw1h98ug53Z ~]# sysctl --system
* Applying /usr/lib/sysctl.d/10-default-yama-scope.conf ...
kernel.yama.ptrace_scope = 0
* Applying /usr/lib/sysctl.d/50-coredump.conf ...
kernel.core_pattern = |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h %e
kernel.core_pipe_limit = 16
* Applying /usr/lib/sysctl.d/50-default.conf ...
kernel.sysrq = 16
kernel.core_uses_pid = 1
kernel.kptr_restrict = 1
net.ipv4.conf.all.rp_filter = 1
net.ipv4.conf.all.accept_source_route = 0
net.ipv4.conf.all.promote_secondaries = 1
net.core.default_qdisc = fq_codel
fs.protected_hardlinks = 1
fs.protected_symlinks = 1
* Applying /usr/lib/sysctl.d/50-libkcapi-optmem_max.conf ...
net.core.optmem_max = 81920
* Applying /usr/lib/sysctl.d/50-pid-max.conf ...
kernel.pid_max = 4194304
* Applying /etc/sysctl.d/99-swap.conf ...
vm.swappiness = 10
* Applying /etc/sysctl.d/99-sysctl.conf ...
vm.swappiness = 0
kernel.sysrq = 1
net.ipv4.neigh.default.gc_stale_time = 120
net.ipv4.conf.all.rp_filter = 0
net.ipv4.conf.default.rp_filter = 0
net.ipv4.conf.default.arp_announce = 2
net.ipv4.conf.lo.arp_announce = 2
net.ipv4.conf.all.arp_announce = 2
net.ipv4.tcp_max_tw_buckets = 5000
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_max_syn_backlog = 1024
net.ipv4.tcp_synack_retries = 2
net.ipv4.tcp_slow_start_after_idle = 0
* Applying /etc/sysctl.conf ...
vm.swappiness = 0
kernel.sysrq = 1
net.ipv4.neigh.default.gc_stale_time = 120
net.ipv4.conf.all.rp_filter = 0
net.ipv4.conf.default.rp_filter = 0
net.ipv4.conf.default.arp_announce = 2
net.ipv4.conf.lo.arp_announce = 2
net.ipv4.conf.all.arp_announce = 2
net.ipv4.tcp_max_tw_buckets = 5000
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_max_syn_backlog = 1024
net.ipv4.tcp_synack_retries = 2
net.ipv4.tcp_slow_start_after_idle = 0
[root@iZt4n2b717v9iw1h98ug53Z ~]# free -h
total used free shared buff/cache available
Mem: 1.7Gi 256Mi 670Mi 1.0Mi 807Mi 1.3Gi
Swap: 4.0Gi 0B 4.0Gi
[root@iZt4n2b717v9iw1h98ug53Z ~]# docker ps -a
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
[root@iZt4n2b717v9iw1h98ug53Z ~]#
第四步-备份数据库
[root@iZt4n2b717v9iw1h98ug53Z ~]# sysctl -w vm.swappiness=10
vm.swappiness = 10
[root@iZt4n2b717v9iw1h98ug53Z ~]# systemctl unmask docker docker.socket containerd
Removed /etc/systemd/system/docker.service.
Removed /etc/systemd/system/docker.socket.
Removed /etc/systemd/system/containerd.service.
[root@iZt4n2b717v9iw1h98ug53Z ~]# systemctl start containerd
[root@iZt4n2b717v9iw1h98ug53Z ~]# systemctl start docker
[root@iZt4n2b717v9iw1h98ug53Z ~]# docker ps --format "table {{.Names}}\t{{.Image}}\t{{.Status}}"
NAMES IMAGE STATUS
admin-test 1panel/node:25.8.0 Up 7 seconds
1Panel-gitea-D7BV commitgo/gitea-ee:25.4.3 Up 7 seconds
1Panel-redis-1zR8 redis:8.6.1 Up 7 seconds
1Panel-mysql-6lo2 mysql:8.4.8 Up 7 seconds
1Panel-nginx-proxy-manager-5bZu jc21/nginx-proxy-manager:2.14.0 Up 7 seconds
1Panel-it-tools-klcx corentinth/it-tools:2024.10.22-7ca5933 Up 7 seconds
1Panel-phpmyadmin-Ihb5 phpmyadmin:5.2.3 Up 7 seconds
1Panel-openresty-CGZy 1panel/openresty:1.27.1.2-5-1-focal Up 7 seconds
[root@iZt4n2b717v9iw1h98ug53Z ~]# docker exec -i 1Panel-mysql-6lo2 mysqldump -utest -p --no-tablespaces --single-transaction --quick --skip-lock-tables test > /root/test.sql
Enter password: mima
[root@iZt4n2b717v9iw1h98ug53Z ~]# ls -lh /root/test.sql
-rw-r--r-- 1 root root 168K Mar 28 19:32 /root/test.sql
[root@iZt4n2b717v9iw1h98ug53Z ~]# gzip /root/test.sql
[root@iZt4n2b717v9iw1h98ug53Z ~]# ls -lh /root/test.sql.gz
-rw-r--r-- 1 root root 19K Mar 28 19:32 /root/test.sql.gz
[root@iZt4n2b717v9iw1h98ug53Z ~]#
发现docker ps已经把所有服务启动了,面板都能正常打开,也没有卡死,数据库软件也能连接上数据库,直接导出了一份。
阿里云服务器限制太多了,除了免密连接,ssh,密码,密钥登录都没法登录服务器。
检查Docker是否自启动
[root@iZt4n2b717v9iw1h98ug53Z ~]# systemctl is-enabled docker
enabled
[root@iZt4n2b717v9iw1h98ug53Z ~]# systemctl is-enabled docker.socket
disabled
[root@iZt4n2b717v9iw1h98ug53Z ~]# docker inspect admin-test --format '{{.HostConfig.RestartPolicy.Name}}'
on-failure
那现在结论很明确:
docker 是 enabled
说明 服务器重启后 Docker 会自动启动
docker.socket 是 disabled
这个影响不大,关键是 docker.service 已经会自动启动
admin-test 的重启策略是 on-failure
说明它不是无条件开机自启
但如果 Docker 恢复它时被判定异常退出,或者容器状态触发失败恢复,它还是可能起来
关闭自启动
[root@iZt4n2b717v9iw1h98ug53Z ~]# sudo systemctl disable docker
Removed /etc/systemd/system/multi-user.target.wants/docker.service.
[root@iZt4n2b717v9iw1h98ug53Z ~]# sudo systemctl disable containerd
[root@iZt4n2b717v9iw1h98ug53Z ~]# systemctl is-enabled docker
disabled
[root@iZt4n2b717v9iw1h98ug53Z ~]# systemctl is-enabled containerd
disabled
[root@iZt4n2b717v9iw1h98ug53Z ~]#
原因
我看了这个 node 服务的日志发现了一些报错,我之前更新后端项目之前没有先停止服务,直接删除了项目文件然后再上传文件的。
经过AI分析之后 更新后,真正给容器用的环境配置没了,容器启动就崩;又因为容器重启策略是 on-failure,于是一直反复重启,才把服务器资源拖高。
解决一直重启问题
1panel的运行环境没有提供设置功能,只好通过命令去设置了。(后来发现在容器里面有相关设置,不再运行环境里面)
把容器重启策略改成 no:
docker update --restart=no admin-test
这样容器启动失败一次就停住,不会反复拉起。
如果以后你想恢复:
docker update --restart=on-failure admin-test
或者:
docker update --restart=unless-stopped admin-test