“connections on Unix domain socket “/var/run/postgresql/.s.PGSQL.5432”?”,其实它并不算一个引起gitlab502/503的原因,一旦其他报错解决,会自动生成这个文件,这个报错并不致命!
一、查看服务器性能
使用top命令查看cpu、内存占用情况
1.如果内存不足可能是第二点导致
2.如果是cpu100%,那可能是gitlab没有正常启动,需要使用第三点进行检查查看打印的日志进行处理
二、服务器配置过低
根据观察和亲身体验,我觉得linux下,gitlab要求2核4G以上配置。太低很容易造成内存占用满。
解决办法:
1.增配服务器
2.(一般不建议)修改配置文件,修改前最好复制一份文件备份
[root@aliqyz004 data]# vim /etc/gitlab/gitlab.rb
修改项我也是通过网上找的相关优化方案进行尝试的
gitlab内存或者cpu占用过高解决方法(unicorn, 交换分区,Prometheus) - 知乎 (zhihu.com)
***
修改后重置配置
[root@aliqyz004 data]# gitlab-ctl reconfigure
Starting Chef Client, version 13.6.4
resolving cookbooks for run list: ["gitlab"]
Synchronizing Cookbooks:
- gitlab (0.0.1)
- package (0.1.0)
- postgresql (0.1.0)
- registry (0.1.0)
- mattermost (0.1.0)
- consul (0.0.0)
- gitaly (0.1.0)
- letsencrypt (0.1.0)
- nginx (0.1.0)
- runit (0.14.2)
- acme (3.1.0)
- crond (0.1.0)
- compat_resource (12.19.0)
然后重启,重启前可以先进行第三点的检查
[root@aliqyz004 data]# gitlab-rake gitlab:check sanitize=true --trace
[root@aliqyz004 data]# gitlab-ctl restart
**建议:**gitlab的数据盘和系统盘要分开,这样gitlab出问题,修改配置文件啥的不会影响到代码数据
三、检查
[root@aliqyz004 data]# gitlab-rake gitlab:check sanitize=true --trace
1.如果启动没有问题,但cpu100%,建议使用下面的检查方法
[root@aliqyz004 ~]# gitlab-ctl start
^[[Cok: run: alertmanager: (pid 5620) 0s
ok: run: gitaly: (pid 5634) 1s
ok: run: gitlab-monitor: (pid 5648) 0s
ok: run: gitlab-workhorse: (pid 5661) 0s
ok: run: logrotate: (pid 5673) 1s
ok: run: nginx: (pid 5679) 0s
ok: run: node-exporter: (pid 5687) 1s
ok: run: postgres-exporter: (pid 5768) 0s
ok: run: postgresql: (pid 5775) 1s
ok: run: prometheus: (pid 5783) 0s
ok: run: redis: (pid 5797) 0s
ok: run: redis-exporter: (pid 5803) 1s
ok: run: sidekiq: (pid 5810) 0s
ok: run: unicorn: (pid 5825) 1s
如果有问题,启动gitlab会导致cpu飙升到100%,根据提示解决相关问题
我这里遇到过一个这问题如上图所示:
“connections on Unix domain socket "/var/opt/gitlab/postgresql/.s.PGSQL.5432"”
检查文件目录及权限(重要,尤其是不小心修改过目录权限),在没有启动gitlab时,应该不存在“postmaster.pid”,如果存在,另存为
“postmaster1.pid”或者删除即可
[root@aliqyz004 data]# pwd
/var/opt/gitlab/postgresql/data
[root@aliqyz004 data]# ll
total 128
drwx------ 6 gitlab-psql root 4096 Jul 30 2019 base
drwx------ 2 gitlab-psql root 4096 Jul 6 13:31 global
drwx------ 2 gitlab-psql root 4096 Jul 30 2019 pg_clog
drwx------ 2 gitlab-psql root 4096 Jul 30 2019 pg_commit_ts
drwx------ 2 gitlab-psql root 4096 Jul 30 2019 pg_dynshmem
-rw-r--r-- 1 gitlab-psql root 3198 Jul 30 2019 pg_hba.conf
-rw-r--r-- 1 gitlab-psql root 1789 Jul 30 2019 pg_ident.conf
drwx------ 4 gitlab-psql root 4096 Jul 30 2019 pg_logical
drwx------ 4 gitlab-psql root 4096 Jul 30 2019 pg_multixact
drwx------ 2 gitlab-psql root 4096 Jul 6 13:31 pg_notify
drwx------ 2 gitlab-psql root 4096 Jul 30 2019 pg_replslot
drwx------ 2 gitlab-psql root 4096 Jul 30 2019 pg_serial
drwx------ 2 gitlab-psql root 4096 Jul 30 2019 pg_snapshots
drwx------ 2 gitlab-psql root 4096 Jul 6 13:31 pg_stat
drwx------ 2 gitlab-psql root 4096 Jul 6 14:01 pg_stat_tmp
drwx------ 2 gitlab-psql root 4096 Jul 30 2019 pg_subtrans
drwx------ 2 gitlab-psql root 4096 Jul 30 2019 pg_tblspc
drwx------ 2 gitlab-psql root 4096 Jul 30 2019 pg_twophase
-rw------- 1 gitlab-psql root 4 Jul 30 2019 PG_VERSION
drwx------ 3 gitlab-psql root 4096 Jun 27 08:16 pg_xlog
-rw------- 1 gitlab-psql root 88 Jul 30 2019 postgresql.auto.conf
-rw-r--r-- 1 gitlab-psql root 16596 Jun 29 17:29 postgresql.conf
-rw------- 1 gitlab-psql gitlab-psql 90 Jul 6 13:31 postmaster.opts
-rw------- 1 gitlab-psql gitlab-psql 101 Jul 6 13:31 postmaster.pid
-rw-r--r-- 1 gitlab-psql root 4544 Jun 29 17:59 runtime.conf
-r-------- 1 gitlab-psql gitlab-psql 1805 Jul 30 2019 server.crt
-r-------- 1 gitlab-psql gitlab-psql 3243 Jul 30 2019 server.key
启动gitlab后,使用gitlab-ctl tail查看日志也发现有报错:
“ 2022-07-06_05:31:23.45600 time="2022-07-06T13:31:23+08:00" level=error msg="unknown error" error="keywatcher: dial unix /var/opt/gitlab/redis/redis.socket: connect: permission denied" ”
我的处理方法:
在指定目录下查看,如果没有redis.socket的话创建一个,同时将dump.rdb另存为dump1.rdb
cd /var/opt/gitlab/redis
touch redis.socket
如果有其他错误要先排除,比如:我之前清除日志是删除后创建文件,会有其他错,应该使用第“2”条的方法进行清理日志
2.如果启动就报错,检查是否磁盘已满
这种问题出现在gitlab启动失败多次,会在日志里大量写入日志
[root@aliqyz004 ~]# gitlab-ctl restart
ok: run: alertmanager: (pid 9554) 1s
ok: run: gitaly: (pid 9571) 0s
ok: run: gitlab-monitor: (pid 9595) 0s
ok: run: gitlab-workhorse: (pid 9601) 1s
ok: run: logrotate: (pid 9612) 0s
ok: run: nginx: (pid 9658) 1s
ok: run: node-exporter: (pid 9667) 0s
ok: run: postgres-exporter: (pid 9682) 0s
timeout: down: postgresql: 0s, normally up, want up
ok: run: prometheus: (pid 10762) 0s
timeout: run: redis: (pid 12144) 1423s, got TERM
ok: run: redis-exporter: (pid 11793) 0s
ok: run: sidekiq: (pid 11809) 0s
ok: run: unicorn: (pid 11817) 0s
[root@aliqyz004 ~]# cd /var/log/gitlab/postgresql/
解决办法:清空日志文件
[root@aliqyz004 gitlab]# echo "" > current
[root@aliqyz004 gitlab]# cd gitlab-rails/
[root@aliqyz004 gitlab-rails]# du -sh *
69M api_json.log
0 api_json.log.1
0 api_json.log.31.gz
124K application.log
***
0 gitlab-rails-db-migrate-2019-07-30-17-02-10.log
0 gitlab-rails-db-migrate-2019-07-30-17-02-10.log.1
0 gitlab-rails-db-migrate-2019-07-30-17-02-10.log.31.gz
0 grpc.log
***
8.0G production_json.log
0 production_json.log.1
0 production_json.log.31.gz
6.4G production.log
0 production.log.1
0 production.log.31.gz
652M sidekiq_exporter.log
0 sidekiq_exporter.log.1
0 sidekiq_exporter.log.31.gz
0 sidekiq.log
解决办法:清空日志文件
[root@aliqyz004 gitlab]# echo "" > production_json.log
[root@aliqyz004 gitlab]# echo "" > production.log
以上的处理都做了之后,我们重置gitlab配置文件,注意查看有没有红色报错日志:
[root@aliqyz004 data]# gitlab-ctl reconfigure
Starting Chef Client, version 13.6.4
resolving cookbooks for run list: ["gitlab"]
Synchronizing Cookbooks:
- gitlab (0.0.1)
- package (0.1.0)
- postgresql (0.1.0)
- registry (0.1.0)
- mattermost (0.1.0)
- consul (0.0.0)
- gitaly (0.1.0)
- letsencrypt (0.1.0)
- nginx (0.1.0)
- runit (0.14.2)
- acme (3.1.0)
- crond (0.1.0)
- compat_resource (12.19.0)
Installing Cookbook Gems:
如果没有,我们执行检查命令,虽然还是有“connections on Unix domain socket "/var/opt/gitlab/postgresql/.s.PGSQL.5432"”报错,我们暂时不管
[root@aliqyz004 data]# gitlab-rake gitlab:check sanitize=true --trace
** Invoke gitlab:check (first_time)
** Invoke gitlab:gitlab_shell:check (first_time)
** Invoke gitlab_environment (first_time)
** Invoke environment (first_time)
** Execute environment
** Execute gitlab_environment
** Execute gitlab:gitlab_shell:check
Checking GitLab Shell ...
GitLab Shell version >= 8.3.3 ? ... OK (8.3.3)
Repo base directory exists?
default... yes
Repo storage directories are symlinks?
default... no
Repo paths owned by git:root, or git:git?
default... yes
Repo paths access is drwxrws---?
default... yes
hooks directories in repos are links: ... rake aborted!
PG::ConnectionBad: could not connect to server: No such file or directory
Is the server running locally and accepting
connections on Unix domain socket "/var/opt/gitlab/postgresql/.s.PGSQL.5432"?
执行启动命令:
[root@aliqyz004 data]# gitlab-ctl start
ok: run: alertmanager: (pid 5426) 0s
ok: run: gitaly: (pid 5440) 1s
ok: run: gitlab-monitor: (pid 5453) 0s
ok: run: gitlab-workhorse: (pid 5470) 1s
ok: run: logrotate: (pid 5481) 0s
ok: run: nginx: (pid 5487) 0s
ok: run: node-exporter: (pid 5495) 1s
ok: run: postgres-exporter: (pid 5576) 0s
ok: run: postgresql: (pid 5583) 1s
ok: run: prometheus: (pid 5591) 0s
ok: run: redis: (pid 5611) 1s
ok: run: redis-exporter: (pid 5615) 0s
ok: run: sidekiq: (pid 5624) 0s
ok: run: unicorn: (pid 5636) 1s
全部都是 OK即可,若不是要针对性再排查。
启动后等待一分钟,我们查看两个目录:
自动生成了postmaster.pid,和.sPGSQL.5432的相关文件。
所以说,gitlab这个报错的文件是需要自己生成的,如果每个服务都正常启动,会自动生成,所以大家要尽量修复其他服务引起的报错为主。