gitlab访问出现502、启动gitlab cpu100%的问题的排除

1,285 阅读7分钟

“connections on Unix domain socket “/var/run/postgresql/.s.PGSQL.5432”?”,其实它并不算一个引起gitlab502/503的原因,一旦其他报错解决,会自动生成这个文件,这个报错并不致命!

一、查看服务器性能

使用top命令查看cpu、内存占用情况

1.如果内存不足可能是第二点导致

2.如果是cpu100%,那可能是gitlab没有正常启动,需要使用第三点进行检查查看打印的日志进行处理

二、服务器配置过低

根据观察和亲身体验,我觉得linux下,gitlab要求2核4G以上配置。太低很容易造成内存占用满。

解决办法:

1.增配服务器

2.(一般不建议)修改配置文件,修改前最好复制一份文件备份

[root@aliqyz004 data]# vim /etc/gitlab/gitlab.rb

修改项我也是通过网上找的相关优化方案进行尝试的

gitlab内存或者cpu占用过高解决方法(unicorn, 交换分区,Prometheus) - 知乎 (zhihu.com)

***

修改后重置配置

[root@aliqyz004 data]# gitlab-ctl reconfigure
Starting Chef Client, version 13.6.4
resolving cookbooks for run list: ["gitlab"]
Synchronizing Cookbooks:
  - gitlab (0.0.1)
  - package (0.1.0)
  - postgresql (0.1.0)
  - registry (0.1.0)
  - mattermost (0.1.0)
  - consul (0.0.0)
  - gitaly (0.1.0)
  - letsencrypt (0.1.0)
  - nginx (0.1.0)
  - runit (0.14.2)
  - acme (3.1.0)
  - crond (0.1.0)
  - compat_resource (12.19.0)

然后重启,重启前可以先进行第三点的检查

[root@aliqyz004 data]# gitlab-rake gitlab:check sanitize=true --trace
[root@aliqyz004 data]# gitlab-ctl restart

**建议:**gitlab的数据盘和系统盘要分开,这样gitlab出问题,修改配置文件啥的不会影响到代码数据

三、检查

[root@aliqyz004 data]# gitlab-rake gitlab:check sanitize=true --trace

1.如果启动没有问题,但cpu100%,建议使用下面的检查方法

[root@aliqyz004 ~]# gitlab-ctl start
^[[Cok: run: alertmanager: (pid 5620) 0s
ok: run: gitaly: (pid 5634) 1s
ok: run: gitlab-monitor: (pid 5648) 0s
ok: run: gitlab-workhorse: (pid 5661) 0s
ok: run: logrotate: (pid 5673) 1s
ok: run: nginx: (pid 5679) 0s
ok: run: node-exporter: (pid 5687) 1s
ok: run: postgres-exporter: (pid 5768) 0s
ok: run: postgresql: (pid 5775) 1s
ok: run: prometheus: (pid 5783) 0s
ok: run: redis: (pid 5797) 0s
ok: run: redis-exporter: (pid 5803) 1s
ok: run: sidekiq: (pid 5810) 0s
ok: run: unicorn: (pid 5825) 1s

如果有问题,启动gitlab会导致cpu飙升到100%,根据提示解决相关问题

我这里遇到过一个这问题如上图所示:

“connections on Unix domain socket "/var/opt/gitlab/postgresql/.s.PGSQL.5432"”

检查文件目录及权限(重要,尤其是不小心修改过目录权限),在没有启动gitlab时,应该不存在“postmaster.pid”,如果存在,另存为

“postmaster1.pid”或者删除即可

[root@aliqyz004 data]# pwd
/var/opt/gitlab/postgresql/data
[root@aliqyz004 data]# ll
total 128
drwx------ 6 gitlab-psql root         4096 Jul 30  2019 base
drwx------ 2 gitlab-psql root         4096 Jul  6 13:31 global
drwx------ 2 gitlab-psql root         4096 Jul 30  2019 pg_clog
drwx------ 2 gitlab-psql root         4096 Jul 30  2019 pg_commit_ts
drwx------ 2 gitlab-psql root         4096 Jul 30  2019 pg_dynshmem
-rw-r--r-- 1 gitlab-psql root         3198 Jul 30  2019 pg_hba.conf
-rw-r--r-- 1 gitlab-psql root         1789 Jul 30  2019 pg_ident.conf
drwx------ 4 gitlab-psql root         4096 Jul 30  2019 pg_logical
drwx------ 4 gitlab-psql root         4096 Jul 30  2019 pg_multixact
drwx------ 2 gitlab-psql root         4096 Jul  6 13:31 pg_notify
drwx------ 2 gitlab-psql root         4096 Jul 30  2019 pg_replslot
drwx------ 2 gitlab-psql root         4096 Jul 30  2019 pg_serial
drwx------ 2 gitlab-psql root         4096 Jul 30  2019 pg_snapshots
drwx------ 2 gitlab-psql root         4096 Jul  6 13:31 pg_stat
drwx------ 2 gitlab-psql root         4096 Jul  6 14:01 pg_stat_tmp
drwx------ 2 gitlab-psql root         4096 Jul 30  2019 pg_subtrans
drwx------ 2 gitlab-psql root         4096 Jul 30  2019 pg_tblspc
drwx------ 2 gitlab-psql root         4096 Jul 30  2019 pg_twophase
-rw------- 1 gitlab-psql root            4 Jul 30  2019 PG_VERSION
drwx------ 3 gitlab-psql root         4096 Jun 27 08:16 pg_xlog
-rw------- 1 gitlab-psql root           88 Jul 30  2019 postgresql.auto.conf
-rw-r--r-- 1 gitlab-psql root        16596 Jun 29 17:29 postgresql.conf
-rw------- 1 gitlab-psql gitlab-psql    90 Jul  6 13:31 postmaster.opts
-rw------- 1 gitlab-psql gitlab-psql   101 Jul  6 13:31 postmaster.pid
-rw-r--r-- 1 gitlab-psql root         4544 Jun 29 17:59 runtime.conf
-r-------- 1 gitlab-psql gitlab-psql  1805 Jul 30  2019 server.crt
-r-------- 1 gitlab-psql gitlab-psql  3243 Jul 30  2019 server.key

启动gitlab后,使用gitlab-ctl tail查看日志也发现有报错:

“ 2022-07-06_05:31:23.45600 time="2022-07-06T13:31:23+08:00" level=error msg="unknown error" error="keywatcher: dial unix /var/opt/gitlab/redis/redis.socket: connect: permission denied" ”

我的处理方法:

在指定目录下查看,如果没有redis.socket的话创建一个,同时将dump.rdb另存为dump1.rdb

cd /var/opt/gitlab/redis
touch redis.socket

如果有其他错误要先排除,比如:我之前清除日志是删除后创建文件,会有其他错,应该使用第“2”条的方法进行清理日志

2.如果启动就报错,检查是否磁盘已满

这种问题出现在gitlab启动失败多次,会在日志里大量写入日志

[root@aliqyz004 ~]# gitlab-ctl restart
ok: run: alertmanager: (pid 9554) 1s
ok: run: gitaly: (pid 9571) 0s
ok: run: gitlab-monitor: (pid 9595) 0s
ok: run: gitlab-workhorse: (pid 9601) 1s
ok: run: logrotate: (pid 9612) 0s
ok: run: nginx: (pid 9658) 1s
ok: run: node-exporter: (pid 9667) 0s
ok: run: postgres-exporter: (pid 9682) 0s
timeout: down: postgresql: 0s, normally up, want up
ok: run: prometheus: (pid 10762) 0s
timeout: run: redis: (pid 12144) 1423s, got TERM
ok: run: redis-exporter: (pid 11793) 0s
ok: run: sidekiq: (pid 11809) 0s
ok: run: unicorn: (pid 11817) 0s

[root@aliqyz004 ~]# cd /var/log/gitlab/postgresql/

解决办法:清空日志文件

[root@aliqyz004 gitlab]# echo "" > current





[root@aliqyz004 gitlab]# cd gitlab-rails/
[root@aliqyz004 gitlab-rails]# du -sh *
69M     api_json.log
0       api_json.log.1
0       api_json.log.31.gz
124K    application.log

***
0       gitlab-rails-db-migrate-2019-07-30-17-02-10.log
0       gitlab-rails-db-migrate-2019-07-30-17-02-10.log.1
0       gitlab-rails-db-migrate-2019-07-30-17-02-10.log.31.gz
0       grpc.log
***
8.0G    production_json.log
0       production_json.log.1
0       production_json.log.31.gz
6.4G    production.log
0       production.log.1
0       production.log.31.gz
652M    sidekiq_exporter.log
0       sidekiq_exporter.log.1
0       sidekiq_exporter.log.31.gz
0       sidekiq.log

解决办法:清空日志文件

[root@aliqyz004 gitlab]# echo "" > production_json.log
[root@aliqyz004 gitlab]# echo "" > production.log

以上的处理都做了之后,我们重置gitlab配置文件,注意查看有没有红色报错日志:

[root@aliqyz004 data]# gitlab-ctl reconfigure
Starting Chef Client, version 13.6.4
resolving cookbooks for run list: ["gitlab"]
Synchronizing Cookbooks:
  - gitlab (0.0.1)
  - package (0.1.0)
  - postgresql (0.1.0)
  - registry (0.1.0)
  - mattermost (0.1.0)
  - consul (0.0.0)
  - gitaly (0.1.0)
  - letsencrypt (0.1.0)
  - nginx (0.1.0)
  - runit (0.14.2)
  - acme (3.1.0)
  - crond (0.1.0)
  - compat_resource (12.19.0)
Installing Cookbook Gems:

如果没有,我们执行检查命令,虽然还是有“connections on Unix domain socket "/var/opt/gitlab/postgresql/.s.PGSQL.5432"”报错,我们暂时不管

[root@aliqyz004 data]# gitlab-rake gitlab:check sanitize=true --trace
** Invoke gitlab:check (first_time)
** Invoke gitlab:gitlab_shell:check (first_time)
** Invoke gitlab_environment (first_time)
** Invoke environment (first_time)
** Execute environment
** Execute gitlab_environment
** Execute gitlab:gitlab_shell:check
Checking GitLab Shell ...

GitLab Shell version >= 8.3.3 ? ... OK (8.3.3)
Repo base directory exists?
default... yes
Repo storage directories are symlinks?
default... no
Repo paths owned by git:root, or git:git?
default... yes
Repo paths access is drwxrws---?
default... yes
hooks directories in repos are links: ... rake aborted!
PG::ConnectionBad: could not connect to server: No such file or directory
        Is the server running locally and accepting
        connections on Unix domain socket "/var/opt/gitlab/postgresql/.s.PGSQL.5432"?

执行启动命令:

[root@aliqyz004 data]# gitlab-ctl start
ok: run: alertmanager: (pid 5426) 0s
ok: run: gitaly: (pid 5440) 1s
ok: run: gitlab-monitor: (pid 5453) 0s
ok: run: gitlab-workhorse: (pid 5470) 1s
ok: run: logrotate: (pid 5481) 0s
ok: run: nginx: (pid 5487) 0s
ok: run: node-exporter: (pid 5495) 1s
ok: run: postgres-exporter: (pid 5576) 0s
ok: run: postgresql: (pid 5583) 1s
ok: run: prometheus: (pid 5591) 0s
ok: run: redis: (pid 5611) 1s
ok: run: redis-exporter: (pid 5615) 0s
ok: run: sidekiq: (pid 5624) 0s
ok: run: unicorn: (pid 5636) 1s

全部都是 OK即可,若不是要针对性再排查。

启动后等待一分钟,我们查看两个目录:

自动生成了postmaster.pid,和.sPGSQL.5432的相关文件。

所以说,gitlab这个报错的文件是需要自己生成的,如果每个服务都正常启动,会自动生成,所以大家要尽量修复其他服务引起的报错为主。