背景
Linkis版本:0.9.4
Linkis是基于springcloud的微服务架构,为了方便扩容engine,所以寻思着将engine服务容器化,但是由于容器的hostname并不能连同,所以改用了ip进行服务注册,结果服务注册成功了,但是rpc调用一直失败,最后发现问题出在jdk的uri解析上,如果hostname是{非纯数字.纯数字}的格式(如http://a127.0.0.1),则uri.getHostname==null
问题分析
gateway解析请求
Linkis是基于springcloud的微服务架构,engine的调用都经过gateway,从gateway的filter (GatewayAuthorizationFilter.filter->gatewayDeal->getRealRoute->) 可以看到Linkis会根据serviceInstance重新组装uri,如果engine注册的服务实例是通过ip注册的,就会生成{非纯数字.纯数字}格式hostname的uri,引发ribbon调用异常
private Route getRealRoute(Route route, ServiceInstance serviceInstance) {
……
if(StringUtils.isNotBlank(serviceInstance.getInstance())) {
uri = scheme + SpringCloudGatewayConfiguration.mergeServiceInstance(serviceInstance);
}
return Route.async().id(route.getId()).filters(route.getFilters()).order(route.getOrder())
.uri(uri).asyncPredicate(route.getPredicate()).build();
}
路由失败
通过分析可以知道gateway的filter依次经过GatewayAuthorizationFilter->RouteToRequestUrlFilter->LoadBalancerClientFilter(除了GatewayAuthorizationFilter是Linkis定义的,其他的都是gateway默认加载的filter,可以通过GatewayAutoConfiguration知道加载了哪些默认的filters,按照order从小到大排序),从LoadBalancerClientFilter可以看到需要从uri中获取host,在上面提到过,如果hostname的格式是{非纯数字.纯数字}(如http://a127.0.0.1),jdk的实现是无法获取host的,也就导致url.getHost()为空,最终导致路由失败
public class LoadBalancerClientFilter implements GlobalFilter, Ordered {
……
public Mono<Void> filter(ServerWebExchange exchange, GatewayFilterChain chain) {
URI url = (URI)exchange.getAttribute(ServerWebExchangeUtils.GATEWAY_REQUEST_URL_ATTR);
String schemePrefix = (String)exchange.getAttribute(ServerWebExchangeUtils.GATEWAY_SCHEME_PREFIX_ATTR);
if (url == null || !"lb".equals(url.getScheme()) && !"lb".equals(schemePrefix)) {
return chain.filter(exchange);
} else {
ServerWebExchangeUtils.addOriginalRequestUrl(exchange, url);
log.trace("LoadBalancerClientFilter url before: " + url);
ServiceInstance instance = this.loadBalancer.choose(url.getHost());
……
}
}
……
}
为什么JDK无法解析呢
首先,我们看看python,是能够正常解析的,其次再看看jdk的URI类说明中提到的www.ietf.org/rfc/rfc2396…
“Hostnames take the form described in Section 3 of [RFC1034] and Section 2.1 of [RFC1123]: a sequence of domain labels separated by ".", each domain label starting and ending with an alphanumeric character and possibly also containing "-" characters. The rightmost domain label of a fully qualified domain name will never start with a digit, thus syntactically distinguishing domain names from IPv4 addresses”
意思就是hostname由domain+.组合而成,domain可以是字母数字和“-”,但是最后一个domain不能是纯数字。所以jdk严格按照规范实现,查看了一下后面的规范也没有改变hostname的定义,不过对于没了解过rfc的程序员,可能就会存在疑惑了。
>>> from urlparse import urlparse
>>> from urlparse import urlparse
>>> url_str = "http://a127.0.0.1"
>>> url = urlparse(url_str)
>>> print 'hostname:',url.hostname
hostname: a127.0.0.1
解决方案
Linkis主干分支已经修复了这个bug,就是编码的时候将"."替换成"--",解码的时候替换回来。
def getServiceInstance(serviceId: String): ServiceInstance = {
var serviceInstanceString = serviceId.substring(MERGE_MODULE_INSTANCE_HEADER.length)
serviceInstanceString match {
case regex(num) =>
serviceInstanceString = serviceInstanceString.substring(num.length)
ServiceInstance(serviceInstanceString.substring(0, num.toInt),
serviceInstanceString.substring(num.toInt).replaceAll("---", ":")
// app register with ip
.replaceAll("--", "."))
}
}
def mergeServiceInstance(serviceInstance: ServiceInstance): String = MERGE_MODULE_INSTANCE_HEADER + serviceInstance.getApplicationName.length +
serviceInstance.getApplicationName + serviceInstance.getInstance.replaceAll(":", "---")
// app register with ip
.replaceAll("\\.", "--")
结束语
这个问题是几个月前遇到的,当时只知道为什么有问题(jdk的解析导致hostname为空,还以为是bug),现在重新review,顺便review了gateway的生命周期和细读rfc,又有了重新的认识。