问题描述
从一个集群的边缘节点访问另一个配置了互信的集群的HDFS时出现如下报错:
# hadoop fs -ls hdfs://node001:8020/
23/09/20 17:24:07 WARN security.UserGroupInformation: Not attempting to re-login since the last re-login was attempted less than 60 seconds before. Last Login=1695201845114
23/09/20 17:24:11 WARN security.UserGroupInformation: Not attempting to re-login since the last re-login was attempted less than 60 seconds before. Last Login=1695201845114
23/09/20 17:24:16 WARN security.UserGroupInformation: Not attempting to re-login since the last re-login was attempted less than 60 seconds before. Last Login=1695201845114
23/09/20 17:24:21 WARN security.UserGroupInformation: Not attempting to re-login since the last re-login was attempted less than 60 seconds before. Last Login=1695201845114
23/09/20 17:24:25 WARN ipc.Client: Couldn't setup connection for test_user@UAT.HADOOP.COM to node001/172.52.169.23:8020
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7) - LOOKING_UP_SERVER)]
... ... ...
Caused by: GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7) - LOOKING_UP_SERVER)
at sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:772)
at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:248)
at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179)
at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:192)
... 45 more
Caused by: KrbException: Server not found in Kerberos database (7) - LOOKING_UP_SERVER
at sun.security.krb5.KrbTgsRep.(KrbTgsRep.java:73)
at sun.security.krb5.KrbTgsReq.getReply(KrbTgsReq.java:251)
at sun.security.krb5.KrbTgsReq.sendAndGetCreds(KrbTgsReq.java:262)
at sun.security.krb5.internal.CredentialsUtil.serviceCreds(CredentialsUtil.java:308)
at sun.security.krb5.internal.CredentialsUtil.acquireServiceCreds(CredentialsUtil.java:126)
at sun.security.krb5.Credentials.acquireServiceCreds(Credentials.java:466)
at sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:695)
... 48 more
Caused by: KrbException: Identifier doesn't match expected value (906)
at sun.security.krb5.internal.KDCRep.init(KDCRep.java:140)
at sun.security.krb5.internal.TGSRep.init(TGSRep.java:65)
at sun.security.krb5.internal.TGSRep.(TGSRep.java:60)
at sun.security.krb5.KrbTgsRep.(KrbTgsRep.java:55)
根据以上日志信息,Kerberos认证失败的原因是在kdc database中没有找到对应节点的服务票据信息,无法访问目的集群的节点。
原因分析
由报错信息“No valid credentials provided (Mechanism level: Server not found in Kerberos database”可以看出,报错的原因是在kerberos database里面没有找到节点对应的票据信息进一步排查krb5kdc.log,发现如下内容:
Sep 20 17:24:25 node0001 krb5kdc2013: TGS_REQ (4 etypes {18 17 16 23}) : LOOKING_UP_SERVER: authtime 0, test_user@HADOOP2.COM for hdfs/node001@HADOOP2.COM, Server not found in Kerberos database
其中节点node001对应的REALM应为HADOOP.CN,而不是HADOOP2.COM,说明在寻找node001的票据信息时,由于REALM的解析问题没有在对应的KDC上找到相应的票据。
查看krb5.conf,查到相关信息(省略到无关内容):
[libdefaults]
default_realm = HADOOP2.COM
[realms]
HADOOP2.COM = {
xxx
}
HADOOP.COM = {
xxx
}
HADOOP.CN = {
xxx
}
[domain_realm]
HADOOP2.COM = HADOOP2.COM
.HADOOP2.COM = HADOOP2.COM
hadoop.com = HADOOP.COM
.hadoop.com = HADOOP.COM
HADOOP.com = HADOOP.COM
.HADOOP.com = HADOOP.COM
HADOOP2.COM = HADOOP.CN
HADOOP.COM = HADOOP.CN
HADOOP.CN = HADOOP.CN
在以上配置中,发现node001没有使用长域名,当查找该节点的票据时,在domain_realm里面并没有匹配到对应的解析规则,从而用default_realm中的配置到对应的kerberos database里查找。
解决办法
在krb5.conf的[domain_realm]新增以下规则,问题解决:
.node001 = HADOOP.CN
node001 = HADOOP.CN
扩展说明
/etc/krb5.conf是Kerberos身份验证系统的配置文件,用于指定Kerberos客户端和服务器的配置参数。下面是一些常见的配置项及其作用的示例:
[libdefaults]:全局默认配置项default_realm:指定默认的Kerberos领域(realm)名称。dns_lookup_realm:指定是否通过DNS查找Kerberos领域。dns_lookup_kdc:指定是否通过DNS查找Kerberos密钥分发中心(KDC)。ticket_lifetime:指定票据的有效期限。forwardable:指定是否允许票据的转发。
示例:
[libdefaults]
default_realm = EXAMPLE.COM
dns_lookup_realm = true
dns_lookup_kdc = true
ticket_lifetime = 24h
forwardable = true
[realms]:定义Kerberos领域的配置项EXAMPLE.COM:指定Kerberos领域的名称。kdc:指定KDC服务器的主机名和端口。admin_server:指定管理员服务器的主机名和端口。
示例:
[realms]
EXAMPLE.COM = {
kdc = kdc.example.com:88
admin_server = kdc.example.com:749
}
[domain_realm]:定义域名与Kerberos领域的映射关系example.com = EXAMPLE.COM:将域名example.com映射到Kerberos领域EXAMPLE.COM。
示例:
[domain_realm]
example.com = EXAMPLE.COM
[logging]:日志记录配置项kdc:指定KDC日志记录级别和日志文件路径。admin_server:指定管理员服务器日志记录级别和日志文件路径。
示例:
[logging]
kdc = FILE:/var/log/krb5kdc.log
admin_server = FILE:/var/log/kadmin.log
上述示例只是/etc/krb5.conf文件中的一小部分配置项,实际的配置文件可能还包含其他配置项,具体根据需求和环境进行配置。