技术分析报告：西溪湿地Kubernetes 集群DNS解析问题一、集群基础信息 Kubernetes 版本：v1.22.

一、集群基础信息

Kubernetes 版本：v1.22.12
节点规模：1 Master + 4 Worker
DNS 组件：
- CoreDNS（ClusterIP：10.233.0.3）
- NodeLocal DNSCache（169.254.25.10）

二、当前 DNS 实际解析链路

1.Pod 内默认解析路径

Pod
→ /etc/resolv.conf
→ nameserver 169.254.25.10
→ NodeLocal DNSCache

2.NodeLocal DNSCache 转发逻辑

根据当前配置：

cluster.local / in-addr.arpa / ip6.arpa
→ forward → 10.233.0.3（CoreDNS）

其他所有域名（包括公网域名）
→ forward → /etc/resolv.conf（宿主机上游 DNS）

实际链路：

集群域名（*.cluster.local）
Pod → NodeLocal → CoreDNS → 返回

外部域名（如 hzxh.gov.cn）
Pod → NodeLocal → 上游DNS → 返回

三、核心问题说明

问题表现

在 CoreDNS 中配置：

xixiwetland.hzxh.gov.cn → 192.168.30.250
直接查询 CoreDNS：

nslookup xxx 10.233.0.3 → 正确
业务 Pod 默认查询：

nslookup xxx → 10.81.143.209（错误）

根因

NodeLocal DNSCache 对外部域名未走 CoreDNS，而是直接转发到上游 DNS。

具体配置：

.:53 {
forward . /etc/resolv.conf
}

导致：

CoreDNS hosts 配置
对业务 Pod 默认解析无效

误导点上游 DNS 返回私网 IP（10.81.x.x）

与 CoreDNS 配置的私网 IP（192.168.x.x）同时存在

结果：

容易误判为“CoreDNS 生效但结果错误”，而非“解析路径错误”

四、问题本质

DNS 控制面与数据面不一致


CoreDNS	提供解析策略（hosts）
NodeLocal	决定实际查询路径

当前状态：

策略在 CoreDNS
流量却绕过 CoreDNS

五、风险评估

当前架构风险

解析结果不可控
- 外部域名由上游 DNS 决定
- CoreDNS override 无效
排障复杂
- 同一域名不同路径结果不同
- 容易误判
环境耦合严重
- 强依赖宿主机 DNS
- 不同节点可能解析不同

六、建议方案

修改后链路：

Pod
→ NodeLocal DNSCache
→ CoreDNS
→ hosts（命中则返回）
→ forward → 上游 DNS

优点

解析路径统一
CoreDNS 策略全局生效
排障简单（单一入口）
行为可预测

七、验证方法

1. 查看 Pod DNS

cat /etc/resolv.conf

应为：

nameserver 169.254.25.10

2. 分路径验证

NodeLocal

nslookup domain 169.254.25.10

CoreDNS

nslookup domain 10.233.0.3

3. 正常结果

两者返回一致

八、一句话总结

当前问题本质是：
NodeLocal DNSCache 将外部域名解析绕过 CoreDNS，导致 CoreDNS 配置不生效。

技术分析报告：西溪湿地Kubernetes 集群DNS解析问题