💡 其实Linkis 1.0刚发布时候就想着体验一下,但是由于工作重心转到了别的方向,再加上拖延症就一直没做。年末手上的项目终于完成,于是抽出一点时间来体验一下新版本,顺带看看把我司在用的老版本升级了。
背景介绍
2019年底的时候,我们公司准备开发一套一站式的大数据平台。但是由于人手是在不够,就想着能不能找一些好用的开源产品做替代。恰好,微众开源了一套大数据中间件项目Linkis,以及前端Scriptis。我们调研以后发现,非常符合我司的需求。于是我们基于它搭建了我们自己内部的大数据一站式平台,包括开发、查询、分析、调度的一整套流程。微众也非常nice的和我们保持着交流,我们也贡献了一些pr。但是呢,由于人员流动加上公司规划的调整,内部的开发陷入了停止,平台长期处于纯维护状态。
随着业务量和人员的增加,平台开始出现一些小问题,偶发性的超时、提交任务失败。这样频繁的维护实在不是办法,于是升级的规划提上了日程。但是从刚开始的0.9.0到现在的1.0.2,Linkis经过了很大的调整,项目也几乎重构,无缝升级说实话也没有太大的把握。
但是不管怎样,拖着终究不是办法,先体验下新版本,跑起来再说!
Let's go
一、环境准备
大数据环境准备
- CDH 5.8.3
- Hadoop 2.6.0
- Hive 1.1.0
- Spark 2.4.3
因为考虑到后续要在生产上使用,因此准备了一套和生产完全一致的大数据环境
服务器准备
- 物理机单台 188G 32核
之前版本的linkis在部署时,分布式和单节点部署差异不大。为了测试方便,这里采用单节点部署
其他中间件
- mysql 5.7
部署用户
需要新建部署用户,这里使用的codeweaver用户,注意需要有sudo权限
二、代码编译
因为组件版本有差异,所以需要对pom文件中的依赖版本进行修改
参照编译文档,首先对几个依赖的版本进行修改
hadoop版本在主pom.xml中修改
<properties>
<hadoop.version>2.6.0</hadoop.version> <!--> 在这里修改Hadoop版本号 <-->
<scala.version>2.11.8</scala.version>
<jdk.compile.version>1.8</jdk.compile.version>
</properties>
hive版本在linkis-engineconn-plugins/engineconn-plugins/hive中的pom.xml
<properties>
<hive.version>1.1.0</hive.version> <!--> 在这里修改Hive版本号 <-->
</properties>
spark版本在linkis-engineconn-plugins/engineconn-plugins/spark中的pom.xml
<properties>
<spark.version>2.4.3</spark.version>
</properties>
💡 注意,hadoop3请参照手册进行修改
修改完成后,编译打包
mvn -N install
mvn clean install
... ...编译时间比较长请耐心等待
编译完成后,会在assembly-combined-package/target下得到wedatasphere-linkis-1.0.2-combined-package-dist.tar.gz
接下来就可以开始部署了
三、部署
上传
首先将压缩包上传服务器,并解压,得到
drwxrwxr-x 2 codeweaver codeweaver 4096 Dec 28 12:29 bin
drwxrwxr-x 2 codeweaver codeweaver 4096 Dec 28 12:29 config
-rwxrwxr-x 1 codeweaver codeweaver 482433664 Dec 28 12:15 wedatasphere-linkis-1.0.2-combined-dist.tar.gz
修改配置
config/db.sh
MYSQL_HOST=127.0.0.1
MYSQL_PORT=3306
MYSQL_DB=linkis
MYSQL_USER=root
MYSQL_PASSWORD=123456
config/linkis-env.sh
#
# @name: linkis-env
#
# Modified for Linkis 1.0.0
# SSH_PORT=22
### deploy user
deployUser=codeweaver
##Linkis_SERVER_VERSION
LINKIS_SERVER_VERSION=v1
### Specifies the user workspace, which is used to store the user's script files and log files.
### Generally local directory
WORKSPACE_USER_ROOT_PATH=file:///tmp/codeweaver/linkis_dev/ ##file:// required
### User's root hdfs path
HDFS_USER_ROOT_PATH=hdfs:///user/codeweaver/linkis_dev ##hdfs:// required
### Path to store started engines and engine logs, must be local
ENGINECONN_ROOT_PATH=/tmp/codeweaver/linkis_dev
#ENTRANCE_CONFIG_LOG_PATH=hdfs:///tmp/linkis/
### Path to store job ResultSet:file or hdfs path
#RESULT_SET_ROOT_PATH=hdfs:///tmp/linkis ##hdfs:// required
### Provide the DB information of Hive metadata database.
### Attention! If there are special characters like "&", they need to be enclosed in quotation marks.
HIVE_META_URL="jdbc:mysql://127.0.0.1:3306/hive"
HIVE_META_USER="root"
HIVE_META_PASSWORD="123456"
##YARN REST URL spark engine required
YARN_RESTFUL_URL=http://127.0.0.1:8088
###HADOOP CONF DIR
HADOOP_CONF_DIR=/etc/hadoop/conf
###HIVE CONF DIR
HIVE_CONF_DIR=/etc/hive/conf
###SPARK CONF DIR
SPARK_CONF_DIR=/etc/spark/conf
## Engine version conf
#SPARK_VERSION
SPARK_VERSION=2.4.3
##HIVE_VERSION
HIVE_VERSION=1.1.0
PYTHON_VERSION=python3
################### The install Configuration of all Micro-Services #####################
#
# NOTICE:
# 1. If you just wanna try, the following micro-service configuration can be set without any settings.
# These services will be installed by default on this machine.
# 2. In order to get the most complete enterprise-level features, we strongly recommend that you install
# Linkis in a distributed manner and set the following microservice parameters
#
### EUREKA install information
### You can access it in your browser at the address below:http://${EUREKA_INSTALL_IP}:${EUREKA_PORT}
#EUREKA_INSTALL_IP=127.0.0.1 # Microservices Service Registration Discovery Center
EUREKA_PORT=20303
export EUREKA_PREFER_IP=false
### Gateway install information
#GATEWAY_INSTALL_IP=127.0.0.1
GATEWAY_PORT=9001
### ApplicationManager
#MANAGER_INSTALL_IP=127.0.0.1
MANAGER_PORT=9101
### EngineManager
#ENGINECONNMANAGER_INSTALL_IP=127.0.0.1
ENGINECONNMANAGER_PORT=9102
### EnginePluginServer
#ENGINECONN_PLUGIN_SERVER_INSTALL_IP=127.0.0.1
ENGINECONN_PLUGIN_SERVER_PORT=9103
### LinkisEntrance
#ENTRANCE_INSTALL_IP=127.0.0.1
ENTRANCE_PORT=9104
### publicservice
#PUBLICSERVICE_INSTALL_IP=127.0.0.1
PUBLICSERVICE_PORT=9105
### cs
#CS_INSTALL_IP=127.0.0.1
CS_PORT=9108
########################################################################################
## LDAP is for enterprise authorization, if you just want to have a try, ignore it.
#LDAP_URL=ldap://localhost:1389/
#LDAP_BASEDN=dc=webank,dc=com
#LDAP_USER_NAME_FORMAT=cn=%s@xxx.com,OU=xxx,DC=xxx,DC=com
## java application default jvm memory
export SERVER_HEAP_SIZE="512M"
##The decompression directory and the installation directory need to be inconsistent
LINKIS_HOME=/home/codeweaver/linkis/
LINKIS_VERSION=1.0.2
# for install
LINKIS_PUBLIC_MODULE=lib/linkis-commons/public-module
修改完成后,开始下一步骤
install
进入bin目录,首先确认安装需要的依赖 sh checkEnv.sh
这里把需要yum安装的包列了出来,也可以手动按照
- yum
- java
- mysql
- telnet
- tar
- sed
- dos2unix
完成后,sh install.sh
Congratulations! You have installed Linkis 1.0.2 successfully, please use sbin/linkis-start-all.sh to start it!
出现这个提示,说明hdfs目录、mysql表、文件解压完成,可以开始启动应用了
💡 保险起见,可以进入conf目录,确认下配置文件是否都已经成功替换
💡 这里发现了一个问题,mysql中linkis_cg_manager_label表的hive版本没有成功替换,看了下是脚本中写的是hive-1.2.1,而linkis_dml.sql中已经是hive-2.3.3。所以没有成功替换
启动应用
进入sbin目录,sh linkis-start-all.sh
Linkis started successfully
启动没有任何问题,查看下eureka的管理页面
8个应用都正常启动了,接下来看下日志,检查下应用是否都正常
💡 踩坑预警
这里遇到了一个问题。在检查linkis-ps-publicservice的时候,发现了报错信息
2021-12-28 14:53:44.330 [ERROR] [qtp555754759-84 ] c.w.w.l.b.c.HdfsResourceHelper (91) [upload] - codeweaver write to hdfs:///apps-data/codeweaver/bml/20211228/0b5ed154-4f42-460a-b5a2-c584a07eb4e4 failed, reason is, IOException: org.apache.hadoop.security.AccessControlException: Permission denied: user=codeweaver, access=WRITE, inode="/":hdfs:supergroup:drwxr-xr-x
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:281)
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:262)
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:242)
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:169)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:152)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:3560)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:3543)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAncestorAccess(FSDirectory.java:3525)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:6588)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4384)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4354)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4327)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:868)
at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.mkdirs(AuthorizationProviderProxyClientProtocol.java:322)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:613)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2141)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2137)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1835)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2135)
咦?这个hdfs好像不是我填写的,为什么会往这里写文件呢?赶紧翻看其他应用的日志,发现原来是engineplugin会在启动时往hdfs写文件
2021-12-28 14:53:42.577 [INFO ] [Linkis-Default-Scheduler-Thread-1 ] c.w.w.l.e.s.s.DefaultEngineConnResourceService (40) [info] - Try to initialize hiveEngineConn-v1.1.0.
2021-12-28 14:53:43.237 [INFO ] [Linkis-Default-Scheduler-Thread-1 ] c.w.w.l.e.s.s.DefaultEngineConnResourceService (40) [info] - Ready to upload a new bmlResource for hiveEngineConn-v1.1.0. path: conf.zip
2021-12-28 14:53:44.686 [INFO ] [Linkis-Default-Scheduler-Thread-1 ] o.r.Reflections (229) [scan] - Reflections took 179 ms to scan 23 urls, producing 341 keys and 3825 values
2021-12-28 14:53:44.723 [ERROR] [Linkis-Default-Scheduler-Thread-1 ] c.w.w.l.e.s.s.DefaultEngineConnResourceService (99) [apply] - error code(错误码): 10905, Error message(错误信息): URL /api/rest_j/v1/bml/upload request failed! ResponseBody is {"method":null,"status":1,"message":"error code(错误码): 60050, error message(错误信息): The first upload of the resource failed(首次上传资源失败).","data":{"errorMsg":{"serviceKind":"linkis-ps-publicservice","level":2,"port":9105,"errCode":50073,"ip":"bd15-21-32-217","desc":"The commit upload resource task failed(提交上传资源任务失败):errCode: 60050 ,desc: The first upload of the resource failed(首次上传资源失败) ,ip: bd15-21-32-217 ,port: 9105 ,serviceKind: linkis-ps-publicservice"}}}.. com.webank.wedatasphere.linkis.httpclient.exception.HttpClientResultException: errCode: 10905 ,desc: URL /api/rest_j/v1/bml/upload request failed! ResponseBody is {"method":null,"status":1,"message":"error code(错误码): 60050, error message(错误信息): The first upload of the resource failed(首次上传资源失败).","data":{"errorMsg":{"serviceKind":"linkis-ps-publicservice","level":2,"port":9105,"errCode":50073,"ip":"bd15-21-32-217","desc":"The commit upload resource task failed(提交上传资源任务失败):errCode: 60050 ,desc: The first upload of the resource failed(首次上传资源失败) ,ip: bd15-21-32-217 ,port: 9105 ,serviceKind: linkis-ps-publicservice"}}}. ,ip: bd15-21-32-217 ,port: 9103 ,serviceKind: linkis-cg-engineplugin
ok,既然配置里没有,那从代码里找找到底是哪个参数控制的吧 于是在BmlServerConfiguration中找到了这个参数
val BML_HDFS_PREFIX = CommonVars("wds.linkis.bml.hdfs.prefix", "/apps-data")
查看其调用
if (StringUtils.isNotEmpty(resourceHeader)) {
return getSchema() + BmlServerConfiguration.BML_HDFS_PREFIX().getValue()
+ "/" + user + "/bml" + "/" + dateStr + "/" + resourceHeader + "/" + fileName;
} else {
return getSchema() + BmlServerConfiguration.BML_HDFS_PREFIX().getValue() + "/" + user + "/bml" + "/" + dateStr + "/" + fileName;
}
ok就是它了
在conf/linkis-ps-publicservice.properties中加上
wds.linkis.bml.hdfs.prefix=/user/codeweaver/linkis_dev
重启应用,再次查看日志
2021-12-28 15:10:03.660 [INFO ] [qtp1617265545-84 ] c.w.w.l.b.s.i.TaskServiceImpl (80) [createUploadTask] - Upload resource successfully. Update task(上传资源成功.更新任务) taskId:12-resourceId:0cbfb738-491d-48e4-8e4f-c2c65b8b79b1 status is success .
2021-12-28 15:10:03.678 [INFO ] [qtp1617265545-84 ] c.w.w.l.b.r.BmlRestfulApi (464) [uploadResource] - User codeweaver submitted upload resource task successfully(用户 codeweaver 提交上传资源任务成功, resourceId is 0cbfb738-491d-48e4-8e4f-c2c65b8b79b1)
问题解决
四、验证
由于这次没有安装前端页面,linkis提供了cli来验证各个引擎是否启动正常。各个脚本都在bin目录中
hive引擎
💡 踩坑预警
[codeweaver@bd15-21-32-217 bin]$ ./bin/linkis-cli-hive -code "SELECT * from test.test_table_1;" -submitUser codeweaver -proxyUser codeweaver
No JDK 8 found. linkis-client requires Java 1.8
好吧修改一下脚本里面的java目录
[codeweaver@bd15-21-32-217 linkis]$ ./bin/linkis-cli-hive -code "SELECT * from mob_bg_devops.servers_exps_weekly_with_wh;" -submitUser codeweaver -proxyUser codeweaver
[INFO] LogFile path: /home/codeweaver/linkis/logs/linkis-cli//linkis-client.codeweaver.log.20211228151637577955060
[INFO] User does not provide usr-configuration file. Will use default config
[INFO] connecting to linkis gateway:http://127.0.0.1:9001
JobId:1
TaskId:1
ExecId:exec_id018019linkis-cg-entrancebd15-21-32-217:9104LINKISCLI_codeweaver_hive_0
[INFO] Job is successfully submitted!
2021-12-28 15:16:41.016 INFO Program is substituting variables for you
2021-12-28 15:16:41.016 INFO Variables substitution ended successfully
2021-12-28 15:16:41.016 WARN You submitted a sql without limit, DSS will add limit 5000 to your sql
2021-12-28 15:16:41.016 INFO SQL code check has passed
job is scheduled.
2021-12-28 15:16:45.016 INFO Your job is Scheduled. Please wait it to run.
Job with jobId : LINKISCLI_codeweaver_hive_0 and execID : LINKISCLI_codeweaver_hive_0 submitted
Your job is being scheduled by orchestrator.
2021-12-28 15:16:45.016 INFO You have submitted a new job, script code (after variable substitution) is
************************************SCRIPT CODE************************************
SELECT * from mob_bg_devops.servers_exps_weekly_with_wh limit 5000
************************************SCRIPT CODE************************************
2021-12-28 15:16:45.016 INFO Your job is accepted, jobID is LINKISCLI_codeweaver_hive_0 and taskID is 1 in ServiceInstance(linkis-cg-entrance, bd15-21-32-217:9104). Please wait it to be scheduled
2021-12-28 15:16:45.016 INFO job is running.
2021-12-28 15:16:45.016 INFO Your job is Running now. Please wait it to complete.
Job with jobGroupId : 1 and subJobId : 1 was submitted to Orchestrator.
2021-12-28 15:16:46.016 INFO Background is starting a new engine for you, it may take several seconds, please wait
2021-12-28 15:16:48.016 ERROR Task is Failed,errorMsg: errCode: 12003 ,desc: bd15-21-32-217:9101_0 Failed to async get EngineNode ErrorException: errCode: 0 ,desc: operation failed(操作失败)s!the reason(原因):EngineConnPluginNotFoundException: errCode: 70063 ,desc: No plugin foundhive-1.2.1please check your configuration ,ip: bd15-21-32-217 ,port: 9103 ,serviceKind: linkis-cg-engineplugin ,ip: bd15-21-32-217 ,port: 9103 ,serviceKind: linkis-cg-engineplugin ,ip: bd15-21-32-217 ,port: 9104 ,serviceKind: linkis-cg-entrance
2021-12-28 15:16:48.016 INFO job is completed.
2021-12-28 15:16:48.016 INFO Task creation time(任务创建时间): 2021-12-28 15:16:40, Task scheduling time(任务调度时间): 2021-12-28 15:16:45, Task start time(任务开始时间): 2021-12-28 15:16:46, Mission end time(任务结束时间): 2021-12-28 15:16:48
2021-12-28 15:16:48.016 INFO Your mission(您的任务) 1 The total time spent is(总耗时时间为): 8.3 秒
2021-12-28 15:16:48.016 INFO Sorry. Your job completed with a status Failed. You can view logs for the reason.
[INFO] Job failed! Will not try get execute result.
============Result:================
TaskId:1
ExecId: exec_id018019linkis-cg-entrancebd15-21-32-217:9104LINKISCLI_codeweaver_hive_0
User:codeweaver
Current job status:FAILED
extraMsg:
errDesc: 21304, Task is Failed,errorMsg: errCode: 12003 ,desc: bd15-21-32-217:9101_0 Failed to async get EngineNode ErrorException: errCode: 0 ,desc: operation failed(操作失败)s!the reason(原因):EngineConnPluginNotFoundException: errCode: 70063 ,desc: No plugin foundhiv
############Execute Error!!!########
执行失败?什么问题。。。开始翻看日志 linkis-cg-engineplugin.log
2021-12-28 15:16:48.114 [ERROR] [message-executor_1 ] c.w.w.l.m.s.DefaultMessageExecutor (131) [lambda$run$5] - method com.webank.wedatasphere.linkis.engineplugin.server.service.DefaultEngineConnResourceFactoryService.createEngineResource call failed java.lang.reflect.InvocationTargetException: null
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_181]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_181]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_181]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_181]
at com.webank.wedatasphere.linkis.message.scheduler.AbstractMessageExecutor.lambda$run$5(AbstractMessageExecutor.java:126) ~[linkis-message-scheduler-1.0.2.jar:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_181]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_181]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
Caused by: java.util.concurrent.ExecutionException: LinkisException{errCode=70063, desc='No plugin foundhive-1.2.1please check your configuration', ip='bd15-21-32-217', port=9103, serviceKind='linkis-cg-engineplugin'}
at com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:526) ~[guava-25.1-jre.jar:?]
at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:487) ~[guava-25.1-jre.jar:?]
at com.google.common.util.concurrent.AbstractFuture$TrustedFuture.get(AbstractFuture.java:83) ~[guava-25.1-jre.jar:?]
at com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:196) ~[guava-25.1-jre.jar:?]
at com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2311) ~[guava-25.1-jre.jar:?]
at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2277) ~[guava-25.1-jre.jar:?]
at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2154) ~[guava-25.1-jre.jar:?]
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2044) ~[guava-25.1-jre.jar:?]
at com.google.common.cache.LocalCache.get(LocalCache.java:3951) ~[guava-25.1-jre.jar:?]
at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4870) ~[guava-25.1-jre.jar:?]
at com.webank.wedatasphere.linkis.manager.engineplugin.cache.GuavaEngineConnPluginCache.get(GuavaEngineConnPluginCache.java:110) ~[linkis-engineconn-plugin-cache-1.0.2.jar:?]
at com.webank.wedatasphere.linkis.manager.engineplugin.manager.loaders.CacheablesEngineConnPluginLoader.getEngineConnPlugin(CacheablesEngineConnPluginLoader.java:65) ~[linkis-engineconn-plugin-loader-1.0.2.jar:?]
at com.webank.wedatasphere.linkis.engineplugin.server.service.DefaultEngineConnResourceFactoryService.getResourceFactoryBy(DefaultEngineConnResourceFactoryService.scala:35) ~[linkis-engineconn-plugin-server-1.0.2.jar:?]
at com.webank.wedatasphere.linkis.engineplugin.server.service.DefaultEngineConnResourceFactoryService.createEngineResource(DefaultEngineConnResourceFactoryService.scala:46) ~[linkis-engineconn-plugin-server-1.0.2.jar:?]
... 10 more
突然想到,哦!我把hive-1.2.1改成了1.1.0了。于是把cli脚本里的hive版本修改了一下,再次执行
[codeweaver@bd15-21-32-217 linkis]$ ./bin/linkis-cli-hive -code "SELECT * from mob_bg_devops.servers_exps_weekly_with_wh;" -submitUser codeweaver -proxyUser codeweaver
[INFO] LogFile path: /home/codeweaver/linkis/logs/linkis-cli//linkis-client.codeweaver.log.20211228152241983233114
[INFO] User does not provide usr-configuration file. Will use default config
[INFO] connecting to linkis gateway:http://127.0.0.1:9001
JobId:2
TaskId:2
ExecId:exec_id018019linkis-cg-entrancebd15-21-32-217:9104LINKISCLI_codeweaver_hive_1
[INFO] Job is successfully submitted!
2021-12-28 15:22:43.022 INFO Program is substituting variables for you
2021-12-28 15:22:43.022 INFO Variables substitution ended successfully
2021-12-28 15:22:43.022 WARN You submitted a sql without limit, DSS will add limit 5000 to your sql
2021-12-28 15:22:43.022 INFO SQL code check has passed
job is scheduled.
2021-12-28 15:22:44.022 INFO Your job is Scheduled. Please wait it to run.
Your job is being scheduled by orchestrator.
Job with jobId : LINKISCLI_codeweaver_hive_1 and execID : LINKISCLI_codeweaver_hive_1 submitted
2021-12-28 15:22:44.022 INFO You have submitted a new job, script code (after variable substitution) is
************************************SCRIPT CODE************************************
SELECT * from mob_bg_devops.servers_exps_weekly_with_wh limit 5000
************************************SCRIPT CODE************************************
2021-12-28 15:22:44.022 INFO Your job is accepted, jobID is LINKISCLI_codeweaver_hive_1 and taskID is 2 in ServiceInstance(linkis-cg-entrance, bd15-21-32-217:9104). Please wait it to be scheduled
2021-12-28 15:22:44.022 INFO job is running.
2021-12-28 15:22:44.022 INFO Your job is Running now. Please wait it to complete.
Job with jobGroupId : 2 and subJobId : 2 was submitted to Orchestrator.
2021-12-28 15:22:44.022 INFO Background is starting a new engine for you, it may take several seconds, please wait
2021-12-28 15:22:50.022 ERROR Task is Failed,errorMsg: errCode: 12003 ,desc: bd15-21-32-217:9101_1 Failed to async get EngineNode ErrorException: errCode: 0 ,desc: operation failed(操作失败)s!the reason(原因):ErrorException: errCode: 30000 ,desc: Necessary environment HADOOP_CONF_DIR is not exists!(必须的环境变量 HADOOP_CONF_DIR 不存在!) ,ip: bd15-21-32-217 ,port: 9102 ,serviceKind: linkis-cg-engineconnmanager ,ip: bd15-21-32-217 ,port: 9102 ,serviceKind: linkis-cg-engineconnmanager ,ip: bd15-21-32-217 ,port: 9104 ,serviceKind: linkis-cg-entrance
2021-12-28 15:22:50.022 INFO job is completed.
2021-12-28 15:22:50.022 INFO Task creation time(任务创建时间): 2021-12-28 15:22:43, Task scheduling time(任务调度时间): 2021-12-28 15:22:44, Task start time(任务开始时间): 2021-12-28 15:22:44, Mission end time(任务结束时间): 2021-12-28 15:22:50
2021-12-28 15:22:50.022 INFO Your mission(您的任务) 2 The total time spent is(总耗时时间为): 6.6 秒
2021-12-28 15:22:50.022 INFO Sorry. Your job completed with a status Failed. You can view logs for the reason.
[INFO] Job failed! Will not try get execute result.
============Result:================
TaskId:2
ExecId: exec_id018019linkis-cg-entrancebd15-21-32-217:9104LINKISCLI_codeweaver_hive_1
User:codeweaver
Current job status:FAILED
extraMsg:
errDesc: 21304, Task is Failed,errorMsg: errCode: 12003 ,desc: bd15-21-32-217:9101_1 Failed to async get EngineNode ErrorException: errCode: 0 ,desc: operation failed(操作失败)s!the reason(原因):ErrorException: errCode: 30000 ,desc: Necessary environment HADOOP_CONF_DIR
############Execute Error!!!########
再次报错,这次提示是HADOOP_CONF_DIR的问题。去日志里验证一下,发现报错是在linkis-cg-engineconnmanager.log。看来已经通过了引擎路由,是具体引擎执行端报的错。
Caused by: com.webank.wedatasphere.linkis.common.exception.ErrorException: errCode: 30000 ,desc: Necessary environment HADOOP_CONF_DIR is not exists!(必须的环境变量 HADOOP_CONF_DIR 不存在!) ,ip: bd15-21-32-217 ,port: 9102 ,serviceKind: linkis-cg-engineconnmanager
看了下linkis-env.sh文件,明明已经配置上了,为什么没有读取到呢?是文件权限的问题吗?把文件权限改成777再次尝试,发现之前的报错信息消失了,但是依旧执行失败。再次查看日志,发现还是HADOOP_CONF_DIR的问题。不死心的我重启了所有应用,发现还是有问题。没办法,只能去代码里找答案了。
override def launch(): Unit = {
request.necessaryEnvironments.foreach{e =>
val env = CommonVars(e, "")
if(StringUtils.isEmpty(env.getValue))
throw new ErrorException(30000, s"Necessary environment $e is not exists!(必须的环境变量 $e 不存在!)") //TODO exception
else request.environment.put(e, env.getValue)
}
prepareCommand()
val exec = newProcessEngineConnCommandExec(sudoCommand(request.user, execFile.mkString(" ")), engineConnManagerEnv.engineConnWorkDir)
exec.execute()
process = exec.getProcess
}
这个TODO让我感到有点慌。。。报错是在这里,那么环境变量从哪里加载的呢?一直追溯到JavaProcessEngineConnLaunchBuilder,发现
if(ifAddHiveConfigPath) {
addPathToClassPath(environment, variable(HADOOP_CONF_DIR))
addPathToClassPath(environment, variable(HIVE_CONF_DIR))
}
def addPathToClassPath(env: java.util.Map[String, String], value: String): Unit = {
val v = if(env.containsKey(Environment.CLASSPATH.toString)) {
env.get(Environment.CLASSPATH.toString) + CLASS_PATH_SEPARATOR + value
} else value
env.put(Environment.CLASSPATH.toString, v)
}
难道是要加在profile里面?赶紧尝试一下,在/etc/profile中加上
###HADOOP CONF DIR
export HADOOP_CONF_DIR=/etc/hadoop/conf
###HIVE CONF DIR
export HIVE_CONF_DIR=/etc/hive/conf
###SPARK CONF DIR
export SPARK_CONF_DIR=/opt/mobdata/spark/spark-2.4.3.mob1-bin-2.6.5/conf
再试一次,之前的错误消失了,但是又是新的问题
[codeweaver@bd15-21-32-217 bin]$ ./linkis-cli-hive -code "SELECT * from mob_bg_devops.servers_exps_weekly_with_wh;" -submitUser codeweaver -proxyUser codeweaver
[INFO] LogFile path: /home/codeweaver/linkis/logs/linkis-cli//linkis-client.codeweaver.log.20211228162640335698166
[INFO] User does not provide usr-configuration file. Will use default config
[INFO] connecting to linkis gateway:http://127.0.0.1:9001
JobId:10
TaskId:10
ExecId:exec_id018019linkis-cg-entrancebd15-21-32-217:9104LINKISCLI_codeweaver_hive_5
[INFO] Job is successfully submitted!
2021-12-28 16:26:42.026 INFO Program is substituting variables for you
2021-12-28 16:26:42.026 INFO Variables substitution ended successfully
2021-12-28 16:26:42.026 WARN You submitted a sql without limit, DSS will add limit 5000 to your sql
2021-12-28 16:26:42.026 INFO SQL code check has passed
job is scheduled.
2021-12-28 16:26:42.026 INFO Your job is Scheduled. Please wait it to run.
Your job is being scheduled by orchestrator.
Job with jobId : LINKISCLI_codeweaver_hive_5 and execID : LINKISCLI_codeweaver_hive_5 submitted
2021-12-28 16:26:42.026 INFO You have submitted a new job, script code (after variable substitution) is
************************************SCRIPT CODE************************************
SELECT * from mob_bg_devops.servers_exps_weekly_with_wh limit 5000
************************************SCRIPT CODE************************************
2021-12-28 16:26:42.026 INFO Your job is accepted, jobID is LINKISCLI_codeweaver_hive_5 and taskID is 10 in ServiceInstance(linkis-cg-entrance, bd15-21-32-217:9104). Please wait it to be scheduled
2021-12-28 16:26:42.026 INFO job is running.
2021-12-28 16:26:42.026 INFO Your job is Running now. Please wait it to complete.
Job with jobGroupId : 10 and subJobId : 10 was submitted to Orchestrator.
2021-12-28 16:26:42.026 INFO Background is starting a new engine for you, it may take several seconds, please wait
2021-12-28 16:26:43.026 INFO Retry---success to rebuild task node:astJob_5_codeExec_5, ready to execute new retry-task:astJob_5_retry_30, current age is 1
2021-12-28 16:26:53.026 INFO Background is starting a new engine for you, it may take several seconds, please wait
2021-12-28 16:26:54.026 INFO Retry---success to rebuild task node:astJob_5_retry_30, ready to execute new retry-task:astJob_5_retry_30, current age is 2
2021-12-28 16:27:04.027 INFO Background is starting a new engine for you, it may take several seconds, please wait
2021-12-28 16:27:04.027 INFO Retry---success to rebuild task node:astJob_5_retry_31, ready to execute new retry-task:astJob_5_retry_31, current age is 3
2021-12-28 16:27:14.027 INFO Background is starting a new engine for you, it may take several seconds, please wait
2021-12-28 16:27:15.027 INFO Retry---success to rebuild task node:astJob_5_retry_32, ready to execute new retry-task:astJob_5_retry_32, current age is 4
2021-12-28 16:27:25.027 INFO Background is starting a new engine for you, it may take several seconds, please wait
2021-12-28 16:27:26.027 INFO Retry---success to rebuild task node:astJob_5_retry_33, ready to execute new retry-task:astJob_5_retry_33, current age is 5
2021-12-28 16:27:36.027 INFO Background is starting a new engine for you, it may take several seconds, please wait
2021-12-28 16:27:37.027 INFO Retry---success to rebuild task node:astJob_5_retry_34, ready to execute new retry-task:astJob_5_retry_34, current age is 6
2021-12-28 16:27:47.027 INFO Background is starting a new engine for you, it may take several seconds, please wait
2021-12-28 16:27:47.027 INFO Retry---success to rebuild task node:astJob_5_retry_35, ready to execute new retry-task:astJob_5_retry_35, current age is 7
2021-12-28 16:27:57.027 INFO Background is starting a new engine for you, it may take several seconds, please wait
2021-12-28 16:27:58.027 INFO Retry---success to rebuild task node:astJob_5_retry_36, ready to execute new retry-task:astJob_5_retry_36, current age is 8
2021-12-28 16:28:08.028 INFO Background is starting a new engine for you, it may take several seconds, please wait
2021-12-28 16:28:09.028 INFO Retry---success to rebuild task node:astJob_5_retry_37, ready to execute new retry-task:astJob_5_retry_37, current age is 9
2021-12-28 16:28:19.028 INFO Background is starting a new engine for you, it may take several seconds, please wait
2021-12-28 16:28:20.028 INFO Retry---success to rebuild task node:astJob_5_retry_38, ready to execute new retry-task:astJob_5_retry_38, current age is 10
2021-12-28 16:28:30.028 INFO Background is starting a new engine for you, it may take several seconds, please wait
2021-12-28 16:28:30.028 ERROR Task is Failed,errorMsg: ask Engine failed + errCode: 12003 ,desc: bd15-21-32-217:9101_89 Failed to async get EngineNodeLinkisRetryException: errCode: 30002 ,desc: 资源不足,请重试: errCode: 11012 ,desc: CPU resources are insufficient, to reduce the number of driver cores(CPU资源不足,建议调小驱动核数) ,ip: bd15-21-32-217 ,port: 9101 ,serviceKind: linkis-cg-linkismanager ,ip: bd15-21-32-217 ,port: 9101 ,serviceKind: linkis-cg-linkismanager ,ip: bd15-21-32-217 ,port: 9104 ,serviceKind: linkis-cg-entrance
2021-12-28 16:28:31.028 INFO job is completed.
2021-12-28 16:28:31.028 INFO Task creation time(任务创建时间): 2021-12-28 16:26:41, Task scheduling time(任务调度时间): 2021-12-28 16:26:42, Task start time(任务开始时间): 2021-12-28 16:26:42, Mission end time(任务结束时间): 2021-12-28 16:28:31
2021-12-28 16:28:31.028 INFO Your mission(您的任务) 10 The total time spent is(总耗时时间为): 1.8 分钟
2021-12-28 16:28:31.028 INFO Sorry. Your job completed with a status Failed. You can view logs for the reason.
[INFO] Job failed! Will not try get execute result.
============Result:================
TaskId:10
ExecId: exec_id018019linkis-cg-entrancebd15-21-32-217:9104LINKISCLI_codeweaver_hive_5
User:codeweaver
Current job status:FAILED
extraMsg:
errCode: 11012
errDesc: 远程服务器CPU资源不足
############Execute Error!!!########
继续查看日志,发现语句以及提交上去了,但是最终结果不是SUCCEED
---------------------------------------------------
task 10 status is RUNNING, progress : 0.0
---------------------------------------------------
2021-12-28 16:28:28,720 INFO LinkisJobLogPresenter(89) - Job is still running, status=RUNNING, progress=0.0%
2021-12-28 16:28:30,710 INFO LinkisSubmitExecutor(101) -
---------------------------------------------------
task 10 status is RUNNING, progress : 0.0
---------------------------------------------------
2021-12-28 16:28:32,743 INFO LinkisSubmitExecutor(101) -
---------------------------------------------------
task 10 status is FAILED, progress : 1.0
---------------------------------------------------
2021-12-28 16:28:34,774 WARN SyncSubmission(154) - Exception thrown when trying to query final result. Status will change to FAILED
com.webank.wedatasphere.linkis.cli.core.exception.ExecutorException: EXE0021,Error occured during execution: Get ResultSet Failed: job Status is not "Succeed", .
at com.webank.wedatasphere.linkis.cli.application.driver.UjesClientDriver.queryResultSetPaths(UjesClientDriver.java:428) ~[linkis-cli-application-1.0.2.jar:?]
at com.webank.wedatasphere.linkis.cli.application.interactor.execution.executor.LinkisSubmitExecutor.doGetFinalResult(LinkisSubmitExecutor.java:173) ~[linkis-cli-application-1.0.2.jar:?]
at com.webank.wedatasphere.linkis.cli.core.interactor.execution.SyncSubmission.ExecWithAsyncBackend(SyncSubmission.java:152) [linkis-cli-core-1.0.2.jar:?]
at com.webank.wedatasphere.linkis.cli.core.interactor.execution.SyncSubmission.execute(SyncSubmission.java:76) [linkis-cli-core-1.0.2.jar:?]
at com.webank.wedatasphere.linkis.cli.application.LinkisClientApplication.exec(LinkisClientApplication.java:349) [linkis-cli-application-1.0.2.jar:?]
at com.webank.wedatasphere.linkis.cli.application.LinkisClientApplication.main(LinkisClientApplication.java:381) [linkis-cli-application-1.0.2.jar:?]
2021-12-28 16:28:34,804 INFO LinkisSubmitExecutor(101) -
---------------------------------------------------
task 10 status is FAILED, progress : 1.0
---------------------------------------------------
2021-12-28 16:28:35,285 INFO LinkisJobLogPresenter(89) - Job is still running, status=FAILED, progress=100.0%
2021-12-28 16:28:38,806 INFO LinkisJobResultPresenter(57) - Job status is not success but 'FAILED'. Will not try to retrieve any Result
怎么解决呢。。。我仔细翻看了各个日志,终于发现了关键
linkis-cg-engineconnmanager.out
54e1b9c0-d4dc-4be9-a49c-4b5f3597f9c8:sudo: sorry, you must have a tty to run sudo
是不是sudo的配置问题?
vi /etc/sudoers (最好用visudo命令)
注释掉 Default requiretty 一行
#Default requiretty
保存,重新执行
[codeweaver@bd15-21-32-217 bin]$ ./linkis-cli-hive -code "SELECT * from mob_bg_devops.servers_exps_weekly_with_wh;" -submitUser codeweaver -proxyUser codeweaver
[INFO] LogFile path: /home/codeweaver/linkis/logs/linkis-cli//linkis-client.codeweaver.log.20211228183116651149061
[INFO] User does not provide usr-configuration file. Will use default config
[INFO] connecting to linkis gateway:http://127.0.0.1:9001
JobId:22
TaskId:22
ExecId:exec_id018019linkis-cg-entrancebd15-21-32-217:9104LINKISCLI_codeweaver_hive_0
[INFO] Job is successfully submitted!
2021-12-28 18:31:19.031 INFO Program is substituting variables for you
2021-12-28 18:31:19.031 INFO Variables substitution ended successfully
2021-12-28 18:31:20.031 WARN You submitted a sql without limit, DSS will add limit 5000 to your sql
2021-12-28 18:31:20.031 INFO SQL code check has passed
job is scheduled.
2021-12-28 18:31:21.031 INFO Your job is Scheduled. Please wait it to run.
Job with jobId : LINKISCLI_codeweaver_hive_0 and execID : LINKISCLI_codeweaver_hive_0 submitted
Your job is being scheduled by orchestrator.
2021-12-28 18:31:21.031 INFO You have submitted a new job, script code (after variable substitution) is
************************************SCRIPT CODE************************************
SELECT * from mob_bg_devops.servers_exps_weekly_with_wh limit 5000
************************************SCRIPT CODE************************************
2021-12-28 18:31:21.031 INFO Your job is accepted, jobID is LINKISCLI_codeweaver_hive_0 and taskID is 22 in ServiceInstance(linkis-cg-entrance, bd15-21-32-217:9104). Please wait it to be scheduled
2021-12-28 18:31:21.031 INFO job is running.
2021-12-28 18:31:21.031 INFO Your job is Running now. Please wait it to complete.
Job with jobGroupId : 22 and subJobId : 22 was submitted to Orchestrator.
2021-12-28 18:31:21.031 INFO Background is starting a new engine for you, it may take several seconds, please wait
2021-12-28 18:31:43.031 INFO EngineConn local log path: ServiceInstance(linkis-cg-engineconn, bd15-21-32-217:26052) /tmp/codeweaver/linkis_dev/codeweaver/workDir/1c3da121-8e1e-4b3f-bbb9-1e09876ae96c/logs
HiveEngineExecutor_0 >> SELECT * from mob_bg_devops.servers_exps_weekly_with_wh limit 5000
2021-12-28 18:31:44.383 ERROR [Linkis-Default-Scheduler-Thread-3] com.webank.wedatasphere.linkis.engineplugin.hive.executor.HiveEngineConnExecutor 200 com$webank$wedatasphere$linkis$engineplugin$hive$executor$HiveEngineConnExecutor$$executeHQL - query failed, reason : java.lang.reflect.InvocationTargetException: null
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_181]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_181]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_181]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_181]
at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_181]
at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_181]
at scala.collection.immutable.Range.foreach(Range.scala:160) [scala-library-2.11.12.jar:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_181]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_181]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_181]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [?:1.8.0_181]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
Caused by: java.lang.NoClassDefFoundError: org/apache/zookeeper/KeeperException$NoNodeException
at java.lang.Class.forName0(Native Method) ~[?:1.8.0_181]
at java.lang.Class.forName(Class.java:348) ~[?:1.8.0_181]
... 43 more
at java.net.URLClassLoader.findClass(URLClassLoader.java:381) ~[?:1.8.0_181]
at java.lang.ClassLoader.loadClass(ClassLoader.java:424) ~[?:1.8.0_181]
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) ~[?:1.8.0_181]
at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ~[?:1.8.0_181]
at java.lang.Class.forName0(Native Method) ~[?:1.8.0_181]
at java.lang.Class.forName(Class.java:348) ~[?:1.8.0_181]
... 43 more
2021-12-28 18:31:44.410 ERROR [Linkis-Default-Scheduler-Thread-3] com.webank.wedatasphere.linkis.engineplugin.hive.executor.HiveEngineConnExecutor 57 error - execute code failed! java.lang.reflect.InvocationTargetException: null
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_181]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_181]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_181]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_181]
at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_181]
at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_181]
at scala.collection.immutable.Range.foreach(Range.scala:160) [scala-library-2.11.12.jar:?]
at com.webank.wedatasphere.linkis.engineconn.acessible.executor.entity.AccessibleExecutor.ensureIdle(AccessibleExecutor.scala:54) [linkis-accessible-executor-1.0.2.jar:?]
at com.webank.wedatasphere.linkis.engineconn.acessible.executor.entity.AccessibleExecutor.ensureIdle(AccessibleExecutor.scala:48) [linkis-accessible-executor-1.0.2.jar:?]
at com.webank.wedatasphere.linkis.engineconn.computation.executor.execute.ComputationExecutor.ensureOp(ComputationExecutor.scala:133) [linkis-computation-engineconn-1.0.2.jar:?]
at com.webank.wedatasphere.linkis.engineconn.computation.executor.execute.ComputationExecutor.execute(ComputationExecutor.scala:236) [linkis-computation-engineconn-1.0.2.jar:?]
at com.webank.wedatasphere.linkis.engineconn.computation.executor.service.TaskExecutionServiceImpl.com$webank$wedatasphere$linkis$engineconn$computation$executor$service$TaskExecutionServiceImpl$$executeTask(TaskExecutionServiceImpl.scala:239) [linkis-computation-engineconn-1.0.2.jar:?]
at com.webank.wedatasphere.linkis.engineconn.computation.executor.service.TaskExecutionServiceImpl$$anon$1$$anonfun$run$1.apply$mcV$sp(TaskExecutionServiceImpl.scala:172) [linkis-computation-engineconn-1.0.2.jar:?]
at com.webank.wedatasphere.linkis.engineconn.computation.executor.service.TaskExecutionServiceImpl$$anon$1$$anonfun$run$1.apply(TaskExecutionServiceImpl.scala:170) [linkis-computation-engineconn-1.0.2.jar:?]
at com.webank.wedatasphere.linkis.engineconn.computation.executor.service.TaskExecutionServiceImpl$$anon$1$$anonfun$run$1.apply(TaskExecutionServiceImpl.scala:170) [linkis-computation-engineconn-1.0.2.jar:?]
at com.webank.wedatasphere.linkis.common.utils.Utils$.tryCatch(Utils.scala:39) [linkis-common-1.0.2.jar:?]
at com.webank.wedatasphere.linkis.common.utils.Utils$.tryAndWarn(Utils.scala:68) [linkis-common-1.0.2.jar:?]
at com.webank.wedatasphere.linkis.engineconn.computation.executor.service.TaskExecutionServiceImpl$$anon$1.run(TaskExecutionServiceImpl.scala:170) [linkis-computation-engineconn-1.0.2.jar:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_181]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_181]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_181]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [?:1.8.0_181]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
Caused by: java.lang.NoClassDefFoundError: org/apache/zookeeper/KeeperException$NoNodeException
at java.lang.Class.forName0(Native Method) ~[?:1.8.0_181]
at java.lang.Class.forName(Class.java:348) ~[?:1.8.0_181]
at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2013) ~[hadoop-common-2.6.0.jar:?]
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1978) ~[hadoop-common-2.6.0.jar:?]
at org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager.getLockManager(DummyTxnManager.java:70) ~[hive-exec-1.1.0.jar:1.1.0]
at org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager.acquireLocks(DummyTxnManager.java:101) ~[hive-exec-1.1.0.jar:1.1.0]
at org.apache.hadoop.hive.ql.Driver.acquireLocksAndOpenTxn(Driver.java:984) ~[hive-exec-1.1.0.jar:1.1.0]
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1172) ~[hive-exec-1.1.0.jar:1.1.0]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049) ~[hive-exec-1.1.0.jar:1.1.0]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1039) ~[hive-exec-1.1.0.jar:1.1.0]
... 43 more
Caused by: java.lang.ClassNotFoundException: org.apache.zookeeper.KeeperException$NoNodeException
at java.net.URLClassLoader.findClass(URLClassLoader.java:381) ~[?:1.8.0_181]
at java.lang.ClassLoader.loadClass(ClassLoader.java:424) ~[?:1.8.0_181]
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) ~[?:1.8.0_181]
at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ~[?:1.8.0_181]
at java.lang.Class.forName0(Native Method) ~[?:1.8.0_181]
at java.lang.Class.forName(Class.java:348) ~[?:1.8.0_181]
at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2013) ~[hadoop-common-2.6.0.jar:?]
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1978) ~[hadoop-common-2.6.0.jar:?]
at org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager.getLockManager(DummyTxnManager.java:70) ~[hive-exec-1.1.0.jar:1.1.0]
at org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager.acquireLocks(DummyTxnManager.java:101) ~[hive-exec-1.1.0.jar:1.1.0]
at org.apache.hadoop.hive.ql.Driver.acquireLocksAndOpenTxn(Driver.java:984) ~[hive-exec-1.1.0.jar:1.1.0]
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1172) ~[hive-exec-1.1.0.jar:1.1.0]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049) ~[hive-exec-1.1.0.jar:1.1.0]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1039) ~[hive-exec-1.1.0.jar:1.1.0]
... 43 more
2021-12-28 18:31:44.428 ERROR [Linkis-Default-Scheduler-Thread-3] com.webank.wedatasphere.linkis.engineconn.computation.executor.service.TaskExecutionServiceImpl 57 error - null java.lang.reflect.InvocationTargetException: null
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_181]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_181]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_181]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_181]
at com.webank.wedatasphere.linkis.engineplugin.hive.executor.HiveDriverProxy.run(HiveEngineConnExecutor.scala:456) ~[linkis-engineplugin-hive-1.0.2.jar:?]
at com.webank.wedatasphere.linkis.engineplugin.hive.executor.HiveEngineConnExecutor.com$webank$wedatasphere$linkis$engineplugin$hive$executor$HiveEngineConnExecutor$$executeHQL(HiveEngineConnExecutor.scala:163) ~[linkis-engineplugin-hive-1.0.2.jar:?]
at com.webank.wedatasphere.linkis.engineplugin.hive.executor.HiveEngineConnExecutor$$anon$1.run(HiveEngineConnExecutor.scala:127) ~[linkis-engineplugin-hive-1.0.2.jar:?]
at com.webank.wedatasphere.linkis.engineplugin.hive.executor.HiveEngineConnExecutor$$anon$1.run(HiveEngineConnExecutor.scala:120) ~[linkis-engineplugin-hive-1.0.2.jar:?]
at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_181]
at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_181]
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) ~[hadoop-common-2.6.0.jar:?]
at com.webank.wedatasphere.linkis.engineplugin.hive.executor.HiveEngineConnExecutor.executeLine(HiveEngineConnExecutor.scala:120) ~[linkis-engineplugin-hive-1.0.2.jar:?]
at com.webank.wedatasphere.linkis.engineconn.computation.executor.execute.ComputationExecutor$$anonfun$toExecuteTask$2$$anonfun$apply$10$$anonfun$apply$11.apply(ComputationExecutor.scala:179) ~[linkis-computation-engineconn-1.0.2.jar:?]
at com.webank.wedatasphere.linkis.engineconn.computation.executor.execute.ComputationExecutor$$anonfun$toExecuteTask$2$$anonfun$apply$10$$anonfun$apply$11.apply(ComputationExecutor.scala:178) ~[linkis-computation-engineconn-1.0.2.jar:?]
at com.webank.wedatasphere.linkis.common.utils.Utils$.tryCatch(Utils.scala:39) ~[linkis-common-1.0.2.jar:?]
at com.webank.wedatasphere.linkis.engineconn.computation.executor.execute.ComputationExecutor$$anonfun$toExecuteTask$2$$anonfun$apply$10.apply(ComputationExecutor.scala:180) ~[linkis-computation-engineconn-1.0.2.jar:?]
at com.webank.wedatasphere.linkis.engineconn.computation.executor.execute.ComputationExecutor$$anonfun$toExecuteTask$2$$anonfun$apply$10.apply(ComputationExecutor.scala:174) ~[linkis-computation-engineconn-1.0.2.jar:?]
at scala.collection.immutable.Range.foreach(Range.scala:160) ~[scala-library-2.11.12.jar:?]
at com.webank.wedatasphere.linkis.engineconn.computation.executor.execute.ComputationExecutor$$anonfun$toExecuteTask$2.apply(ComputationExecutor.scala:173) ~[linkis-computation-engineconn-1.0.2.jar:?]
at com.webank.wedatasphere.linkis.engineconn.computation.executor.execute.ComputationExecutor$$anonfun$toExecuteTask$2.apply(ComputationExecutor.scala:149) ~[linkis-computation-engineconn-1.0.2.jar:?]
at com.webank.wedatasphere.linkis.common.utils.Utils$.tryFinally(Utils.scala:60) ~[linkis-common-1.0.2.jar:?]
at com.webank.wedatasphere.linkis.engineconn.computation.executor.execute.ComputationExecutor.toExecuteTask(ComputationExecutor.scala:222) ~[linkis-computation-engineconn-1.0.2.jar:?]
at com.webank.wedatasphere.linkis.engineconn.computation.executor.execute.ComputationExecutor$$anonfun$3.apply(ComputationExecutor.scala:237) ~[linkis-computation-engineconn-1.0.2.jar:?]
at com.webank.wedatasphere.linkis.engineconn.computation.executor.execute.ComputationExecutor$$anonfun$3.apply(ComputationExecutor.scala:237) ~[linkis-computation-engineconn-1.0.2.jar:?]
at com.webank.wedatasphere.linkis.common.utils.Utils$.tryFinally(Utils.scala:60) ~[linkis-common-1.0.2.jar:?]
at com.webank.wedatasphere.linkis.engineconn.acessible.executor.entity.AccessibleExecutor.ensureIdle(AccessibleExecutor.scala:54) ~[linkis-accessible-executor-1.0.2.jar:?]
at com.webank.wedatasphere.linkis.engineconn.acessible.executor.entity.AccessibleExecutor.ensureIdle(AccessibleExecutor.scala:48) ~[linkis-accessible-executor-1.0.2.jar:?]
at com.webank.wedatasphere.linkis.engineconn.computation.executor.execute.ComputationExecutor.ensureOp(ComputationExecutor.scala:133) ~[linkis-computation-engineconn-1.0.2.jar:?]
at com.webank.wedatasphere.linkis.engineconn.computation.executor.execute.ComputationExecutor.execute(ComputationExecutor.scala:236) ~[linkis-computation-engineconn-1.0.2.jar:?]
at com.webank.wedatasphere.linkis.engineconn.computation.executor.service.TaskExecutionServiceImpl.com$webank$wedatasphere$linkis$engineconn$computation$executor$service$TaskExecutionServiceImpl$$executeTask(TaskExecutionServiceImpl.scala:239) [linkis-computation-engineconn-1.0.2.jar:?]
at com.webank.wedatasphere.linkis.engineconn.computation.executor.service.TaskExecutionServiceImpl$$anon$1$$anonfun$run$1.apply$mcV$sp(TaskExecutionServiceImpl.scala:172) [linkis-computation-engineconn-1.0.2.jar:?]
at com.webank.wedatasphere.linkis.engineconn.computation.executor.service.TaskExecutionServiceImpl$$anon$1$$anonfun$run$1.apply(TaskExecutionServiceImpl.scala:170) [linkis-computation-engineconn-1.0.2.jar:?]
at com.webank.wedatasphere.linkis.engineconn.computation.executor.service.TaskExecutionServiceImpl$$anon$1$$anonfun$run$1.apply(TaskExecutionServiceImpl.scala:170) [linkis-computation-engineconn-1.0.2.jar:?]
at com.webank.wedatasphere.linkis.common.utils.Utils$.tryCatch(Utils.scala:39) [linkis-common-1.0.2.jar:?]
at com.webank.wedatasphere.linkis.common.utils.Utils$.tryAndWarn(Utils.scala:68) [linkis-common-1.0.2.jar:?]
at com.webank.wedatasphere.linkis.engineconn.computation.executor.service.TaskExecutionServiceImpl$$anon$1.run(TaskExecutionServiceImpl.scala:170) [linkis-computation-engineconn-1.0.2.jar:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_181]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_181]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_181]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [?:1.8.0_181]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
Caused by: java.lang.NoClassDefFoundError: org/apache/zookeeper/KeeperException$NoNodeException
at java.lang.Class.forName0(Native Method) ~[?:1.8.0_181]
at java.lang.Class.forName(Class.java:348) ~[?:1.8.0_181]
at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2013) ~[hadoop-common-2.6.0.jar:?]
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1978) ~[hadoop-common-2.6.0.jar:?]
at org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager.getLockManager(DummyTxnManager.java:70) ~[hive-exec-1.1.0.jar:1.1.0]
at org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager.acquireLocks(DummyTxnManager.java:101) ~[hive-exec-1.1.0.jar:1.1.0]
at org.apache.hadoop.hive.ql.Driver.acquireLocksAndOpenTxn(Driver.java:984) ~[hive-exec-1.1.0.jar:1.1.0]
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1172) ~[hive-exec-1.1.0.jar:1.1.0]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049) ~[hive-exec-1.1.0.jar:1.1.0]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1039) ~[hive-exec-1.1.0.jar:1.1.0]
... 43 more
Caused by: java.lang.ClassNotFoundException: org.apache.zookeeper.KeeperException$NoNodeException
at java.net.URLClassLoader.findClass(URLClassLoader.java:381) ~[?:1.8.0_181]
at java.lang.ClassLoader.loadClass(ClassLoader.java:424) ~[?:1.8.0_181]
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) ~[?:1.8.0_181]
at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ~[?:1.8.0_181]
at java.lang.Class.forName0(Native Method) ~[?:1.8.0_181]
at java.lang.Class.forName(Class.java:348) ~[?:1.8.0_181]
at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2013) ~[hadoop-common-2.6.0.jar:?]
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1978) ~[hadoop-common-2.6.0.jar:?]
at org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager.getLockManager(DummyTxnManager.java:70) ~[hive-exec-1.1.0.jar:1.1.0]
at org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager.acquireLocks(DummyTxnManager.java:101) ~[hive-exec-1.1.0.jar:1.1.0]
at org.apache.hadoop.hive.ql.Driver.acquireLocksAndOpenTxn(Driver.java:984) ~[hive-exec-1.1.0.jar:1.1.0]
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1172) ~[hive-exec-1.1.0.jar:1.1.0]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049) ~[hive-exec-1.1.0.jar:1.1.0]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1039) ~[hive-exec-1.1.0.jar:1.1.0]
... 43 more
2021-12-28 18:31:44.031 ERROR Task is Failed,errorMsg: null
2021-12-28 18:31:44.031 INFO job is completed.
2021-12-28 18:31:44.031 INFO Task creation time(任务创建时间): 2021-12-28 18:31:19, Task scheduling time(任务调度时间): 2021-12-28 18:31:21, Task start time(任务开始时间): 2021-12-28 18:31:21, Mission end time(任务结束时间): 2021-12-28 18:31:44
2021-12-28 18:31:44.031 INFO Your mission(您的任务) 22 The total time spent is(总耗时时间为): 25.6 秒
2021-12-28 18:31:44.031 INFO Sorry. Your job completed with a status Failed. You can view logs for the reason.
[INFO] Job failed! Will not try get execute result.
============Result:================
TaskId:22
ExecId: exec_id018019linkis-cg-entrancebd15-21-32-217:9104LINKISCLI_codeweaver_hive_0
User:codeweaver
Current job status:FAILED
extraMsg:
errDesc: 21304, Task is Failed,errorMsg: null
############Execute Error!!!########
??? 新问题?看上去是依赖冲突,把冲突的包替换掉应该就没问题了吧。
不过包是在哪里引用的呢?暂时没有头绪,先试下一个引擎吧
spark引擎
💡 直接上坑
[codeweaver@bd15-21-32-217 bin]$ ./linkis-cli-spark-sql -code "SELECT * from mob_bg_devops.servers_exps_weekly_with_wh;" -submitUser codeweaver -proxyUser codeweaver
[INFO] LogFile path: /home/codeweaver/linkis/logs/linkis-cli//linkis-client.codeweaver.log.20211228174733617974682
[INFO] User does not provide usr-configuration file. Will use default config
[INFO] connecting to linkis gateway:http://127.0.0.1:9001
JobId:3
TaskId:3
ExecId:exec_id018019linkis-cg-entrancebd15-21-32-217:9104LINKISCLI_codeweaver_spark_1
[INFO] Job is successfully submitted!
2021-12-28 17:47:35.047 INFO Program is substituting variables for you
2021-12-28 17:47:35.047 INFO Variables substitution ended successfully
2021-12-28 17:47:35.047 WARN You submitted a sql without limit, DSS will add limit 5000 to your sql
2021-12-28 17:47:35.047 INFO SQL code check has passed
job is scheduled.
2021-12-28 17:47:36.047 INFO Your job is Scheduled. Please wait it to run.
Your job is being scheduled by orchestrator.
Job with jobId : LINKISCLI_codeweaver_spark_1 and execID : LINKISCLI_codeweaver_spark_1 submitted
2021-12-28 17:47:36.047 INFO You have submitted a new job, script code (after variable substitution) is
************************************SCRIPT CODE************************************
SELECT * from mob_bg_devops.servers_exps_weekly_with_wh limit 5000
************************************SCRIPT CODE************************************
2021-12-28 17:47:36.047 INFO Your job is accepted, jobID is LINKISCLI_codeweaver_spark_1 and taskID is 3 in ServiceInstance(linkis-cg-entrance, bd15-21-32-217:9104). Please wait it to be scheduled
2021-12-28 17:47:36.047 INFO job is running.
2021-12-28 17:47:36.047 INFO Your job is Running now. Please wait it to complete.
Job with jobGroupId : 3 and subJobId : 3 was submitted to Orchestrator.
2021-12-28 17:47:36.047 INFO Background is starting a new engine for you, it may take several seconds, please wait
2021-12-28 17:47:36.047 ERROR Task is Failed,errorMsg: errCode: 12003 ,desc: bd15-21-32-217:9101_2 Failed to async get EngineNode RMErrorException: errCode: 11006 ,desc: Failed to request external resourceRMWarnException: errCode: 11006 ,desc: queue ide is not exists in YARN. ,ip: bd15-21-32-217 ,port: 9101 ,serviceKind: linkis-cg-linkismanager ,ip: bd15-21-32-217 ,port: 9101 ,serviceKind: linkis-cg-linkismanager ,ip: bd15-21-32-217 ,port: 9104 ,serviceKind: linkis-cg-entrance
2021-12-28 17:47:37.047 INFO job is completed.
2021-12-28 17:47:37.047 INFO Task creation time(任务创建时间): 2021-12-28 17:47:35, Task scheduling time(任务调度时间): 2021-12-28 17:47:36, Task start time(任务开始时间): 2021-12-28 17:47:36, Mission end time(任务结束时间): 2021-12-28 17:47:37
2021-12-28 17:47:37.047 INFO Your mission(您的任务) 3 The total time spent is(总耗时时间为): 1.8 秒
2021-12-28 17:47:37.047 INFO Sorry. Your job completed with a status Failed. You can view logs for the reason.
[INFO] Job failed! Will not try get execute result.
============Result:================
TaskId:3
ExecId: exec_id018019linkis-cg-entrancebd15-21-32-217:9104LINKISCLI_codeweaver_spark_1
User:codeweaver
Current job status:FAILED
extraMsg:
errCode: 10001
errDesc: 会话创建失败,ide队列不存在,请检查队列设置是否正确
############Execute Error!!!########
好吧看上去是队列问题,默认调用了ide队列
2021-12-28 17:47:36.610 [INFO ] [ForkJoinPool-1-worker-7 ] c.w.w.l.m.a.s.e.DefaultEngineAskEngineService (45) [info] - Failed to async(bd15-21-32-217:9101_2) createEngine com.webank.wedatasphere.linkis.resourcemanager.exception.RMErrorException: errCode: 11006 ,desc: Failed to request external resourceRMWarnException: errCode: 11006 ,desc: queue ide is not exists in YARN. ,ip: bd15-21-32-217 ,port: 9101 ,serviceKind: linkis-cg-linkismanager ,ip: bd15-21-32-217 ,port: 9101 ,serviceKind: linkis-cg-linkismanager
加上队列再试下
[codeweaver@bd15-21-32-217 bin]$ ./linkis-cli-spark-sql -code "SELECT * from mob_bg_devops.servers_exps_weekly_with_wh;" -submitUser codeweaver -proxyUser codeweaver --queue default
[INFO] LogFile path: /home/codeweaver/linkis/logs/linkis-cli//linkis-client.codeweaver.log.20211228185636659565504
[INFO] User does not provide usr-configuration file. Will use default config
[INFO] connecting to linkis gateway:http://127.0.0.1:9001
JobId:27
TaskId:27
ExecId:exec_id018019linkis-cg-entrancebd15-21-32-217:9104LINKISCLI_codeweaver_spark_0
[INFO] Job is successfully submitted!
2021-12-28 18:56:38.056 INFO Program is substituting variables for you
2021-12-28 18:56:38.056 INFO Variables substitution ended successfully
2021-12-28 18:56:38.056 WARN You submitted a sql without limit, DSS will add limit 5000 to your sql
2021-12-28 18:56:38.056 INFO SQL code check has passed
job is scheduled.
2021-12-28 18:56:38.056 INFO Your job is Scheduled. Please wait it to run.
Your job is being scheduled by orchestrator.
Job with jobId : LINKISCLI_codeweaver_spark_0 and execID : LINKISCLI_codeweaver_spark_0 submitted
2021-12-28 18:56:38.056 INFO You have submitted a new job, script code (after variable substitution) is
************************************SCRIPT CODE************************************
SELECT * from mob_bg_devops.servers_exps_weekly_with_wh limit 5000
************************************SCRIPT CODE************************************
2021-12-28 18:56:38.056 INFO Your job is accepted, jobID is LINKISCLI_codeweaver_spark_0 and taskID is 27 in ServiceInstance(linkis-cg-entrance, bd15-21-32-217:9104). Please wait it to be scheduled
2021-12-28 18:56:38.056 INFO job is running.
2021-12-28 18:56:38.056 INFO Your job is Running now. Please wait it to complete.
Job with jobGroupId : 27 and subJobId : 27 was submitted to Orchestrator.
2021-12-28 18:56:38.056 INFO Background is starting a new engine for you, it may take several seconds, please wait
2021-12-28 18:56:40.056 INFO Retry---success to rebuild task node:astJob_2_codeExec_2, ready to execute new retry-task:astJob_2_retry_0, current age is 1
2021-12-28 18:56:50.056 INFO Background is starting a new engine for you, it may take several seconds, please wait
2021-12-28 18:56:50.056 INFO Retry---success to rebuild task node:astJob_2_retry_0, ready to execute new retry-task:astJob_2_retry_0, current age is 2
2021-12-28 18:57:00.057 INFO Background is starting a new engine for you, it may take several seconds, please wait
2021-12-28 18:57:01.057 INFO Retry---success to rebuild task node:astJob_2_retry_1, ready to execute new retry-task:astJob_2_retry_1, current age is 3
2021-12-28 18:57:11.057 INFO Background is starting a new engine for you, it may take several seconds, please wait
2021-12-28 18:57:12.057 INFO Retry---success to rebuild task node:astJob_2_retry_2, ready to execute new retry-task:astJob_2_retry_2, current age is 4
2021-12-28 18:57:22.057 INFO Background is starting a new engine for you, it may take several seconds, please wait
2021-12-28 18:57:22.057 INFO Retry---success to rebuild task node:astJob_2_retry_3, ready to execute new retry-task:astJob_2_retry_3, current age is 5
2021-12-28 18:57:32.057 INFO Background is starting a new engine for you, it may take several seconds, please wait
2021-12-28 18:57:33.057 INFO Retry---success to rebuild task node:astJob_2_retry_4, ready to execute new retry-task:astJob_2_retry_4, current age is 6
2021-12-28 18:57:43.057 INFO Background is starting a new engine for you, it may take several seconds, please wait
2021-12-28 18:57:44.057 INFO Retry---success to rebuild task node:astJob_2_retry_5, ready to execute new retry-task:astJob_2_retry_5, current age is 7
2021-12-28 18:57:54.057 INFO Background is starting a new engine for you, it may take several seconds, please wait
2021-12-28 18:57:55.057 INFO Retry---success to rebuild task node:astJob_2_retry_6, ready to execute new retry-task:astJob_2_retry_6, current age is 8
2021-12-28 18:58:05.058 INFO Background is starting a new engine for you, it may take several seconds, please wait
2021-12-28 18:58:05.058 INFO Retry---success to rebuild task node:astJob_2_retry_7, ready to execute new retry-task:astJob_2_retry_7, current age is 9
2021-12-28 18:58:15.058 INFO Background is starting a new engine for you, it may take several seconds, please wait
2021-12-28 18:58:16.058 INFO Retry---success to rebuild task node:astJob_2_retry_8, ready to execute new retry-task:astJob_2_retry_8, current age is 10
2021-12-28 18:58:26.058 INFO Background is starting a new engine for you, it may take several seconds, please wait
2021-12-28 18:58:27.058 ERROR Task is Failed,errorMsg: ask Engine failed + errCode: 12003 ,desc: bd15-21-32-217:9101_23 Failed to async get EngineNodeLinkisRetryException: errCode: 30002 ,desc: 资源不足,请重试: errCode: 11014 ,desc: Queue CPU resources are insufficient, reduce the number of executors.(队列CPU资源不足,建议调小执行器个数) ,ip: bd15-21-32-217 ,port: 9101 ,serviceKind: linkis-cg-linkismanager ,ip: bd15-21-32-217 ,port: 9101 ,serviceKind: linkis-cg-linkismanager ,ip: bd15-21-32-217 ,port: 9104 ,serviceKind: linkis-cg-entrance
2021-12-28 18:58:27.058 INFO job is completed.
2021-12-28 18:58:27.058 INFO Task creation time(任务创建时间): 2021-12-28 18:56:38, Task scheduling time(任务调度时间): 2021-12-28 18:56:38, Task start time(任务开始时间): 2021-12-28 18:56:38, Mission end time(任务结束时间): 2021-12-28 18:58:27
2021-12-28 18:58:27.058 INFO Your mission(您的任务) 27 The total time spent is(总耗时时间为): 1.8 分钟
2021-12-28 18:58:27.058 INFO Sorry. Your job completed with a status Failed. You can view logs for the reason.
[INFO] Job failed! Will not try get execute result.
============Result:================
TaskId:27
ExecId: exec_id018019linkis-cg-entrancebd15-21-32-217:9104LINKISCLI_codeweaver_spark_0
User:codeweaver
Current job status:FAILED
extraMsg:
errCode: 11014
errDesc: 队列CPU资源不足
############Execute Error!!!########
查看日志
f2511155-4f0f-4dcd-818b-df3a0cb3632a:WARNING: User-defined SPARK_HOME (/opt/mobdata/spark/spark-2.4.3.mob1-bin-2.6.5) overrides detected (/opt/cloudera/parcels/CDH-5.7.6-1.cdh5.7.6.p0.6/lib/spark).
f2511155-4f0f-4dcd-818b-df3a0cb3632a:WARNING: Running spark-class from user-defined location.
f2511155-4f0f-4dcd-818b-df3a0cb3632a:Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=250m; support was removed in 8.0
f2511155-4f0f-4dcd-818b-df3a0cb3632a:Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0
f2511155-4f0f-4dcd-818b-df3a0cb3632a:SLF4J: Class path contains multiple SLF4J bindings.
f2511155-4f0f-4dcd-818b-df3a0cb3632a:SLF4J: Found binding in [jar:file:/tmp/codeweaver/linkis_dev/engineConnPublickDir/3e349615-708d-44e0-899b-6b6b590d219e/v000002/lib/log4j-slf4j-impl-2.13.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
f2511155-4f0f-4dcd-818b-df3a0cb3632a:SLF4J: Found binding in [jar:file:/opt/mobdata/spark/spark-2.4.3.mob1-bin-2.6.5/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
f2511155-4f0f-4dcd-818b-df3a0cb3632a:SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
f2511155-4f0f-4dcd-818b-df3a0cb3632a:SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
f2511155-4f0f-4dcd-818b-df3a0cb3632a:Exception in thread "main" java.lang.NoClassDefFoundError: scala/Product$class
f2511155-4f0f-4dcd-818b-df3a0cb3632a: at com.webank.wedatasphere.linkis.common.conf.CommonVars.<init>(CommonVars.scala:22)
f2511155-4f0f-4dcd-818b-df3a0cb3632a: at com.webank.wedatasphere.linkis.common.conf.CommonVars$.apply(CommonVars.scala:35)
f2511155-4f0f-4dcd-818b-df3a0cb3632a: at com.webank.wedatasphere.linkis.common.conf.CommonVars.apply(CommonVars.scala)
f2511155-4f0f-4dcd-818b-df3a0cb3632a: at com.webank.wedatasphere.linkis.manager.label.conf.LabelCommonConfig.<clinit>(LabelCommonConfig.java:23)
f2511155-4f0f-4dcd-818b-df3a0cb3632a: at com.webank.wedatasphere.linkis.manager.label.builder.factory.LabelBuilderFactoryContext.getLabelBuilderFactory(LabelBuilderFactoryContext.java:45)
f2511155-4f0f-4dcd-818b-df3a0cb3632a: at com.webank.wedatasphere.linkis.engineconn.launch.EngineConnServer$.<init>(EngineConnServer.scala:30)
f2511155-4f0f-4dcd-818b-df3a0cb3632a: at com.webank.wedatasphere.linkis.engineconn.launch.EngineConnServer$.<clinit>(EngineConnServer.scala)
f2511155-4f0f-4dcd-818b-df3a0cb3632a: at com.webank.wedatasphere.linkis.engineconn.launch.EngineConnServer.main(EngineConnServer.scala)
f2511155-4f0f-4dcd-818b-df3a0cb3632a: at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
f2511155-4f0f-4dcd-818b-df3a0cb3632a: at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
f2511155-4f0f-4dcd-818b-df3a0cb3632a: at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
f2511155-4f0f-4dcd-818b-df3a0cb3632a: at java.lang.reflect.Method.invoke(Method.java:497)
f2511155-4f0f-4dcd-818b-df3a0cb3632a: at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
f2511155-4f0f-4dcd-818b-df3a0cb3632a: at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)
f2511155-4f0f-4dcd-818b-df3a0cb3632a: at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
f2511155-4f0f-4dcd-818b-df3a0cb3632a: at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
f2511155-4f0f-4dcd-818b-df3a0cb3632a: at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
f2511155-4f0f-4dcd-818b-df3a0cb3632a: at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
f2511155-4f0f-4dcd-818b-df3a0cb3632a: at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
f2511155-4f0f-4dcd-818b-df3a0cb3632a: at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
f2511155-4f0f-4dcd-818b-df3a0cb3632a:Caused by: java.lang.ClassNotFoundException: scala.Product$class
f2511155-4f0f-4dcd-818b-df3a0cb3632a: at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
f2511155-4f0f-4dcd-818b-df3a0cb3632a: at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
f2511155-4f0f-4dcd-818b-df3a0cb3632a: at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
f2511155-4f0f-4dcd-818b-df3a0cb3632a: at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
f2511155-4f0f-4dcd-818b-df3a0cb3632a: ... 20 more
这。。。莫非是scala的版本不对应? 然后我检查了一下spark的scala版本和linkis的scala版本,发现。。。我们的spark2.4.3竟然是2.12的scala,而linkis是2.11版本
好吧。。只能重新编译了
下一位
shell引擎
竟然意想不到的顺利
[codeweaver@bd15-21-32-217 bin]$ ./linkis-cli -engineType shell-1 -codeType shell -code "echo 123;" -submitUser codeweaver -proxyUser codeweaver
[INFO] LogFile path: /home/codeweaver/linkis/logs/linkis-cli/linkis-client.codeweaver.log.20211228165047156877579
[INFO] User does not provide usr-configuration file. Will use default config
[INFO] connecting to linkis gateway:http://127.0.0.1:9001
JobId:15
TaskId:15
ExecId:exec_id018019linkis-cg-entrancebd15-21-32-217:9104LINKISCLI_codeweaver_shell_0
[INFO] Job is successfully submitted!
2021-12-28 16:50:48.050 INFO Program is substituting variables for you
2021-12-28 16:50:48.050 INFO Variables substitution ended successfully
job is scheduled.
2021-12-28 16:50:49.050 INFO Your job is Scheduled. Please wait it to run.
Your job is being scheduled by orchestrator.
Job with jobId : LINKISCLI_codeweaver_shell_0 and execID : LINKISCLI_codeweaver_shell_0 submitted
2021-12-28 16:50:49.050 INFO You have submitted a new job, script code (after variable substitution) is
************************************SCRIPT CODE************************************
echo 123
************************************SCRIPT CODE************************************
2021-12-28 16:50:49.050 INFO Your job is accepted, jobID is LINKISCLI_codeweaver_shell_0 and taskID is 15 in ServiceInstance(linkis-cg-entrance, bd15-21-32-217:9104). Please wait it to be scheduled
2021-12-28 16:50:49.050 INFO job is running.
2021-12-28 16:50:49.050 INFO Your job is Running now. Please wait it to complete.
Job with jobGroupId : 15 and subJobId : 15 was submitted to Orchestrator.
2021-12-28 16:50:49.050 INFO Background is starting a new engine for you, it may take several seconds, please wait
2021-12-28 16:51:08.051 INFO EngineConn local log path: ServiceInstance(linkis-cg-engineconn, bd15-21-32-217:33772) /tmp/codeweaver/linkis_dev/codeweaver/workDir/98dab8a4-389c-43d4-82bf-aa1d1b3eb595/logs
bd15-21-32-217:33772_0 >> echo 123
Your subjob : 15 execue with state succeed, has 1 resultsets.
Congratuaions! Your job : LINKISCLI_codeweaver_shell_0 executed with status succeed and 0 results.
2021-12-28 16:51:10.051 INFO job is completed.
2021-12-28 16:51:10.051 INFO Task creation time(任务创建时间): 2021-12-28 16:50:48, Task scheduling time(任务调度时间): 2021-12-28 16:50:49, Task start time(任务开始时间): 2021-12-28 16:50:49, Mission end time(任务结束时间): 2021-12-28 16:51:10
2021-12-28 16:51:10.051 INFO Your mission(您的任务) 15 The total time spent is(总耗时时间为): 21.7 秒
2021-12-28 16:51:10.051 INFO Congratulations. Your job completed with status Success.
[INFO] Job execute successfully! Will try get execute result
============Result:================
TaskId:15
ExecId: exec_id018019linkis-cg-entrancebd15-21-32-217:9104LINKISCLI_codeweaver_shell_0
User:codeweaver
Current job status:SUCCEED
extraMsg:
result:
============RESULT SET============
123
============END OF RESULT SET============
############Execute Success!!!########
四、阶段性总结
今天三个引擎成功调试完成了shell,hive和spark还有不少坑要踩。。。
下期再战!