这是我参与11月更文挑战的第13天,活动详情查看:2021最后一次更文挑战
问题描述
2021-11-13 15:15:00.281 [] WARN o.a.h.s.UserGroupInformation - Not attempting to re-login since the last re-login was attempted less than 600 seconds before.
2021-11-13 15:15:01.090 [] WARN o.a.h.h.i.RpcClientImpl - Couldn't setup connection for xxx@EXAMPLE.COM to hbase/xx-hbase-dn5@EXAMPLE.COM
2021-11-13 15:15:03.094 [] WARN o.a.h.s.UserGroupInformation - Not attempting to re-login since the last re-login was attempted less than 600 seconds before.
2021-11-13 15:15:06.210 [] WARN o.a.h.s.UserGroupInformation - Not attempting to re-login since the last re-login was attempted less than 600 seconds before.
2021-11-13 15:15:08.491 [] WARN o.a.h.s.UserGroupInformation - Not attempting to re-login since the last re-login was attempted less than 600 seconds before.
2021-11-13 15:15:09.068 [] WARN o.a.h.s.UserGroupInformation - Not attempting to re-login since the last re-login was attempted less than 600 seconds before.
2021-11-13 15:15:13.039 [] WARN o.a.h.s.UserGroupInformation - Not attempting to re-login since the last re-login was attempted less than 600 seconds before.
2021-11-13 15:15:15.077 [] WARN o.a.h.h.i.RpcClientImpl - Couldn't setup connection for xxx@EXAMPLE.COM to hbase/xx-hbase-dn5@EXAMPLE.COM
2021-11-13 15:15:17.097 [] WARN o.a.h.s.UserGroupInformation - Not attempting to re-login since the last re-login was attempted less than 600 seconds before.
2021-11-13 15:15:21.171 [] WARN o.a.h.s.UserGroupInformation - Not attempting to re-login since the last re-login was attempted less than 600 seconds before.
2021-11-13 15:15:25.508 [] WARN o.a.h.s.UserGroupInformation - Not attempting to re-login since the last re-login was attempted less than 600 seconds before.
2021-11-13 15:15:26.373 [] WARN o.a.h.s.UserGroupInformation - Not attempting to re-login since the last re-login was attempted less than 600 seconds before.
问题原因分析
1、咨询运维和Hbase专家,生产环境Hbase有没有变更,确认没有变更
2、周四发布之后应用正常,分析出现异常的时间点,发现发布24个小时左右出现的异常
3、为啥24小时之后出现连接不正常呢? 咨询Hbase专家 kerberos票据有一个失效机制,24小时之后失效,需要重新认证。
4、查看krb5.conf配置文件
[libdefaults]
dns_lookup_realm = false
ticket_lifetime = 24h
renew_lifetime = 7d
forwardable = true
rdns = false
pkinit_anchors = FILE:/etc/pki/tls/certs/ca-bundle.crt
default_realm = EXAMPLE.COM
#default_ccache_name = KEYRING:persistent:%{uid}
我们可以看到票据的生命周期是24小时
5、排除集群和网络的问题,肯定是二方包的问题,看下和之前二方包的差异主要有两点
1 ) 获取connection的时候, 只获取一次,当连接关闭之后没有重新获取, 增加判断连接是否关闭操作,当连接关闭需要重新创建连接。
Hbase connection和MySQL数据库的重连接有所不同,Hbase connection 连接一次,会将集群的元数据信息放到本地缓存,之后不需要反复去做重连接操作。
但是Hbase Spring提供的templete没有这么做不知道是不是这种考虑,每次读写都会关闭connection,频繁读写connection其实是一个很耗时的操作。
2 ) Kerberos 认证之后,没有做检查和重新re-login 操作
参考文章
issues.apache.org/jira/browse… issues.apache.org/jira/browse…