Linkis 1.0.2保姆级安装手册&踩坑实录(一)

3,044 阅读19分钟

💡 其实Linkis 1.0刚发布时候就想着体验一下,但是由于工作重心转到了别的方向,再加上拖延症就一直没做。年末手上的项目终于完成,于是抽出一点时间来体验一下新版本,顺带看看把我司在用的老版本升级了。

背景介绍

  2019年底的时候,我们公司准备开发一套一站式的大数据平台。但是由于人手是在不够,就想着能不能找一些好用的开源产品做替代。恰好,微众开源了一套大数据中间件项目Linkis,以及前端Scriptis。我们调研以后发现,非常符合我司的需求。于是我们基于它搭建了我们自己内部的大数据一站式平台,包括开发、查询、分析、调度的一整套流程。微众也非常nice的和我们保持着交流,我们也贡献了一些pr。但是呢,由于人员流动加上公司规划的调整,内部的开发陷入了停止,平台长期处于纯维护状态。

  随着业务量和人员的增加,平台开始出现一些小问题,偶发性的超时、提交任务失败。这样频繁的维护实在不是办法,于是升级的规划提上了日程。但是从刚开始的0.9.0到现在的1.0.2,Linkis经过了很大的调整,项目也几乎重构,无缝升级说实话也没有太大的把握。

  但是不管怎样,拖着终究不是办法,先体验下新版本,跑起来再说!

Let's go

一、环境准备

大数据环境准备

  • CDH 5.8.3
  • Hadoop 2.6.0
  • Hive 1.1.0
  • Spark 2.4.3

因为考虑到后续要在生产上使用,因此准备了一套和生产完全一致的大数据环境

服务器准备

  • 物理机单台 188G 32核

之前版本的linkis在部署时,分布式和单节点部署差异不大。为了测试方便,这里采用单节点部署

其他中间件

  • mysql 5.7

部署用户

需要新建部署用户,这里使用的codeweaver用户,注意需要有sudo权限

二、代码编译

因为组件版本有差异,所以需要对pom文件中的依赖版本进行修改

参照编译文档,首先对几个依赖的版本进行修改

hadoop版本在主pom.xml中修改

    <properties>
      
        <hadoop.version>2.6.0</hadoop.version> <!--> 在这里修改Hadoop版本号 <-->
              
        <scala.version>2.11.8</scala.version>
        <jdk.compile.version>1.8</jdk.compile.version>
              
    </properties>

hive版本在linkis-engineconn-plugins/engineconn-plugins/hive中的pom.xml

    <properties>
        <hive.version>1.1.0</hive.version> <!--> 在这里修改Hive版本号 <-->
    </properties>

spark版本在linkis-engineconn-plugins/engineconn-plugins/spark中的pom.xml

    <properties>
        <spark.version>2.4.3</spark.version>
    </properties>

💡 注意,hadoop3请参照手册进行修改

修改完成后,编译打包

mvn -N install
mvn clean install

... ...编译时间比较长请耐心等待

image.png

编译完成后,会在assembly-combined-package/target下得到wedatasphere-linkis-1.0.2-combined-package-dist.tar.gz

接下来就可以开始部署了

三、部署

上传

首先将压缩包上传服务器,并解压,得到

drwxrwxr-x 2 codeweaver codeweaver      4096 Dec 28 12:29 bin
drwxrwxr-x 2 codeweaver codeweaver      4096 Dec 28 12:29 config
-rwxrwxr-x 1 codeweaver codeweaver 482433664 Dec 28 12:15 wedatasphere-linkis-1.0.2-combined-dist.tar.gz

修改配置

config/db.sh

MYSQL_HOST=127.0.0.1
MYSQL_PORT=3306
MYSQL_DB=linkis
MYSQL_USER=root
MYSQL_PASSWORD=123456

config/linkis-env.sh

#
# @name:        linkis-env
#
# Modified for Linkis 1.0.0

# SSH_PORT=22

### deploy user
deployUser=codeweaver

##Linkis_SERVER_VERSION
LINKIS_SERVER_VERSION=v1

### Specifies the user workspace, which is used to store the user's script files and log files.
### Generally local directory
WORKSPACE_USER_ROOT_PATH=file:///tmp/codeweaver/linkis_dev/ ##file:// required
### User's root hdfs path
HDFS_USER_ROOT_PATH=hdfs:///user/codeweaver/linkis_dev ##hdfs:// required



### Path to store started engines and engine logs, must be local
ENGINECONN_ROOT_PATH=/tmp/codeweaver/linkis_dev

#ENTRANCE_CONFIG_LOG_PATH=hdfs:///tmp/linkis/

### Path to store job ResultSet:file or hdfs path
#RESULT_SET_ROOT_PATH=hdfs:///tmp/linkis ##hdfs:// required

### Provide the DB information of Hive metadata database.
### Attention! If there are special characters like "&", they need to be enclosed in quotation marks.
HIVE_META_URL="jdbc:mysql://127.0.0.1:3306/hive"
HIVE_META_USER="root"
HIVE_META_PASSWORD="123456"

##YARN REST URL  spark engine required
YARN_RESTFUL_URL=http://127.0.0.1:8088

###HADOOP CONF DIR
HADOOP_CONF_DIR=/etc/hadoop/conf

###HIVE CONF DIR
HIVE_CONF_DIR=/etc/hive/conf

###SPARK CONF DIR
SPARK_CONF_DIR=/etc/spark/conf

## Engine version conf
#SPARK_VERSION
SPARK_VERSION=2.4.3
##HIVE_VERSION
HIVE_VERSION=1.1.0
PYTHON_VERSION=python3

################### The install Configuration of all Micro-Services #####################
#
#    NOTICE:
#       1. If you just wanna try, the following micro-service configuration can be set without any settings.
#            These services will be installed by default on this machine.
#       2. In order to get the most complete enterprise-level features, we strongly recommend that you install
#            Linkis in a distributed manner and set the following microservice parameters
#

###  EUREKA install information
###  You can access it in your browser at the address below:http://${EUREKA_INSTALL_IP}:${EUREKA_PORT}
#EUREKA_INSTALL_IP=127.0.0.1         # Microservices Service Registration Discovery Center
EUREKA_PORT=20303
export EUREKA_PREFER_IP=false

###  Gateway install information
#GATEWAY_INSTALL_IP=127.0.0.1
GATEWAY_PORT=9001

### ApplicationManager
#MANAGER_INSTALL_IP=127.0.0.1
MANAGER_PORT=9101

### EngineManager
#ENGINECONNMANAGER_INSTALL_IP=127.0.0.1
ENGINECONNMANAGER_PORT=9102



### EnginePluginServer
#ENGINECONN_PLUGIN_SERVER_INSTALL_IP=127.0.0.1
ENGINECONN_PLUGIN_SERVER_PORT=9103

### LinkisEntrance
#ENTRANCE_INSTALL_IP=127.0.0.1
ENTRANCE_PORT=9104

###  publicservice
#PUBLICSERVICE_INSTALL_IP=127.0.0.1
PUBLICSERVICE_PORT=9105

### cs
#CS_INSTALL_IP=127.0.0.1
CS_PORT=9108

########################################################################################

## LDAP is for enterprise authorization, if you just want to have a try, ignore it.
#LDAP_URL=ldap://localhost:1389/
#LDAP_BASEDN=dc=webank,dc=com
#LDAP_USER_NAME_FORMAT=cn=%s@xxx.com,OU=xxx,DC=xxx,DC=com

## java application default jvm memory
export SERVER_HEAP_SIZE="512M"

##The decompression directory and the installation directory need to be inconsistent
LINKIS_HOME=/home/codeweaver/linkis/

LINKIS_VERSION=1.0.2

# for install
LINKIS_PUBLIC_MODULE=lib/linkis-commons/public-module

修改完成后,开始下一步骤

install

进入bin目录,首先确认安装需要的依赖 sh checkEnv.sh

这里把需要yum安装的包列了出来,也可以手动按照

  • yum
  • java
  • mysql
  • telnet
  • tar
  • sed
  • dos2unix

完成后,sh install.sh

Congratulations! You have installed Linkis 1.0.2 successfully, please use sbin/linkis-start-all.sh to start it!

出现这个提示,说明hdfs目录、mysql表、文件解压完成,可以开始启动应用了

💡 保险起见,可以进入conf目录,确认下配置文件是否都已经成功替换

💡 这里发现了一个问题,mysql中linkis_cg_manager_label表的hive版本没有成功替换,看了下是脚本中写的是hive-1.2.1,而linkis_dml.sql中已经是hive-2.3.3。所以没有成功替换

启动应用

进入sbin目录,sh linkis-start-all.sh

Linkis started successfully

启动没有任何问题,查看下eureka的管理页面

image.png

8个应用都正常启动了,接下来看下日志,检查下应用是否都正常

💡 踩坑预警

这里遇到了一个问题。在检查linkis-ps-publicservice的时候,发现了报错信息

2021-12-28 14:53:44.330 [ERROR] [qtp555754759-84                         ] c.w.w.l.b.c.HdfsResourceHelper (91) [upload] - codeweaver write to hdfs:///apps-data/codeweaver/bml/20211228/0b5ed154-4f42-460a-b5a2-c584a07eb4e4 failed, reason is, IOException: org.apache.hadoop.security.AccessControlException: Permission denied: user=codeweaver, access=WRITE, inode="/":hdfs:supergroup:drwxr-xr-x
        at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:281)
        at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:262)
        at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:242)
        at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:169)
        at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:152)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:3560)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:3543)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAncestorAccess(FSDirectory.java:3525)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:6588)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4384)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4354)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4327)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:868)
        at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.mkdirs(AuthorizationProviderProxyClientProtocol.java:322)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:613)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2141)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2137)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1835)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2135)

咦?这个hdfs好像不是我填写的,为什么会往这里写文件呢?赶紧翻看其他应用的日志,发现原来是engineplugin会在启动时往hdfs写文件

2021-12-28 14:53:42.577 [INFO ] [Linkis-Default-Scheduler-Thread-1       ] c.w.w.l.e.s.s.DefaultEngineConnResourceService (40) [info] - Try to initialize hiveEngineConn-v1.1.0.
2021-12-28 14:53:43.237 [INFO ] [Linkis-Default-Scheduler-Thread-1       ] c.w.w.l.e.s.s.DefaultEngineConnResourceService (40) [info] - Ready to upload a new bmlResource for hiveEngineConn-v1.1.0. path: conf.zip
2021-12-28 14:53:44.686 [INFO ] [Linkis-Default-Scheduler-Thread-1       ] o.r.Reflections (229) [scan] - Reflections took 179 ms to scan 23 urls, producing 341 keys and 3825 values
2021-12-28 14:53:44.723 [ERROR] [Linkis-Default-Scheduler-Thread-1       ] c.w.w.l.e.s.s.DefaultEngineConnResourceService (99) [apply] - error code(错误码): 10905, Error message(错误信息): URL /api/rest_j/v1/bml/upload request failed! ResponseBody is {"method":null,"status":1,"message":"error code(错误码): 60050, error message(错误信息): The first upload of the resource failed(首次上传资源失败).","data":{"errorMsg":{"serviceKind":"linkis-ps-publicservice","level":2,"port":9105,"errCode":50073,"ip":"bd15-21-32-217","desc":"The commit upload resource task failed(提交上传资源任务失败):errCode: 60050 ,desc: The first upload of the resource failed(首次上传资源失败) ,ip: bd15-21-32-217 ,port: 9105 ,serviceKind: linkis-ps-publicservice"}}}.. com.webank.wedatasphere.linkis.httpclient.exception.HttpClientResultException: errCode: 10905 ,desc: URL /api/rest_j/v1/bml/upload request failed! ResponseBody is {"method":null,"status":1,"message":"error code(错误码): 60050, error message(错误信息): The first upload of the resource failed(首次上传资源失败).","data":{"errorMsg":{"serviceKind":"linkis-ps-publicservice","level":2,"port":9105,"errCode":50073,"ip":"bd15-21-32-217","desc":"The commit upload resource task failed(提交上传资源任务失败):errCode: 60050 ,desc: The first upload of the resource failed(首次上传资源失败) ,ip: bd15-21-32-217 ,port: 9105 ,serviceKind: linkis-ps-publicservice"}}}. ,ip: bd15-21-32-217 ,port: 9103 ,serviceKind: linkis-cg-engineplugin

ok,既然配置里没有,那从代码里找找到底是哪个参数控制的吧 于是在BmlServerConfiguration中找到了这个参数

val BML_HDFS_PREFIX = CommonVars("wds.linkis.bml.hdfs.prefix", "/apps-data")

查看其调用

if (StringUtils.isNotEmpty(resourceHeader)) {
    return getSchema() + BmlServerConfiguration.BML_HDFS_PREFIX().getValue()
            + "/" + user + "/bml" + "/" + dateStr + "/" + resourceHeader + "/" + fileName;
} else {
    return getSchema() + BmlServerConfiguration.BML_HDFS_PREFIX().getValue() + "/" + user + "/bml" + "/" + dateStr + "/" + fileName;
}

ok就是它了

在conf/linkis-ps-publicservice.properties中加上

wds.linkis.bml.hdfs.prefix=/user/codeweaver/linkis_dev

重启应用,再次查看日志

2021-12-28 15:10:03.660 [INFO ] [qtp1617265545-84                        ] c.w.w.l.b.s.i.TaskServiceImpl (80) [createUploadTask] - Upload resource successfully. Update task(上传资源成功.更新任务) taskId:12-resourceId:0cbfb738-491d-48e4-8e4f-c2c65b8b79b1 status is   success .
2021-12-28 15:10:03.678 [INFO ] [qtp1617265545-84                        ] c.w.w.l.b.r.BmlRestfulApi (464) [uploadResource] - User codeweaver submitted upload resource task successfully(用户 codeweaver 提交上传资源任务成功, resourceId is 0cbfb738-491d-48e4-8e4f-c2c65b8b79b1)

问题解决

四、验证

由于这次没有安装前端页面,linkis提供了cli来验证各个引擎是否启动正常。各个脚本都在bin目录中

hive引擎

💡 踩坑预警

[codeweaver@bd15-21-32-217 bin]$ ./bin/linkis-cli-hive -code "SELECT * from test.test_table_1;" -submitUser codeweaver -proxyUser codeweaver
No JDK 8 found. linkis-client requires Java 1.8

好吧修改一下脚本里面的java目录

[codeweaver@bd15-21-32-217 linkis]$ ./bin/linkis-cli-hive -code "SELECT * from mob_bg_devops.servers_exps_weekly_with_wh;" -submitUser codeweaver -proxyUser codeweaver
[INFO] LogFile path: /home/codeweaver/linkis/logs/linkis-cli//linkis-client.codeweaver.log.20211228151637577955060
[INFO] User does not provide usr-configuration file. Will use default config
[INFO] connecting to linkis gateway:http://127.0.0.1:9001
JobId:1
TaskId:1
ExecId:exec_id018019linkis-cg-entrancebd15-21-32-217:9104LINKISCLI_codeweaver_hive_0
[INFO] Job is successfully submitted!

2021-12-28 15:16:41.016 INFO Program is substituting variables for you
2021-12-28 15:16:41.016 INFO Variables substitution ended successfully
2021-12-28 15:16:41.016 WARN You submitted a sql without limit, DSS will add limit 5000 to your sql
2021-12-28 15:16:41.016 INFO SQL code check has passed
job is scheduled.
2021-12-28 15:16:45.016 INFO Your job is Scheduled. Please wait it to run.
Job with jobId : LINKISCLI_codeweaver_hive_0 and execID : LINKISCLI_codeweaver_hive_0 submitted
Your job is being scheduled by orchestrator.
2021-12-28 15:16:45.016 INFO You have submitted a new job, script code (after variable substitution) is
************************************SCRIPT CODE************************************
SELECT * from mob_bg_devops.servers_exps_weekly_with_wh limit 5000
************************************SCRIPT CODE************************************
2021-12-28 15:16:45.016 INFO Your job is accepted,  jobID is LINKISCLI_codeweaver_hive_0 and taskID is 1 in ServiceInstance(linkis-cg-entrance, bd15-21-32-217:9104). Please wait it to be scheduled
2021-12-28 15:16:45.016 INFO job is running.
2021-12-28 15:16:45.016 INFO Your job is Running now. Please wait it to complete.
Job with jobGroupId : 1 and subJobId : 1 was submitted to Orchestrator.
2021-12-28 15:16:46.016 INFO Background is starting a new engine for you, it may take several seconds, please wait
2021-12-28 15:16:48.016 ERROR Task is Failed,errorMsg: errCode: 12003 ,desc: bd15-21-32-217:9101_0 Failed  to async get EngineNode ErrorException: errCode: 0 ,desc: operation failed(操作失败)s!the reason(原因):EngineConnPluginNotFoundException: errCode: 70063 ,desc: No plugin foundhive-1.2.1please check your configuration ,ip: bd15-21-32-217 ,port: 9103 ,serviceKind: linkis-cg-engineplugin ,ip: bd15-21-32-217 ,port: 9103 ,serviceKind: linkis-cg-engineplugin ,ip: bd15-21-32-217 ,port: 9104 ,serviceKind: linkis-cg-entrance
2021-12-28 15:16:48.016 INFO job is completed.
2021-12-28 15:16:48.016 INFO Task creation time(任务创建时间): 2021-12-28 15:16:40, Task scheduling time(任务调度时间): 2021-12-28 15:16:45, Task start time(任务开始时间): 2021-12-28 15:16:46, Mission end time(任务结束时间): 2021-12-28 15:16:48
2021-12-28 15:16:48.016 INFO Your mission(您的任务) 1 The total time spent is(总耗时时间为): 8.32021-12-28 15:16:48.016 INFO Sorry. Your job completed with a status Failed. You can view logs for the reason.

[INFO] Job failed! Will not try get execute result.
============Result:================
TaskId:1
ExecId: exec_id018019linkis-cg-entrancebd15-21-32-217:9104LINKISCLI_codeweaver_hive_0
User:codeweaver
Current job status:FAILED
extraMsg:
errDesc: 21304, Task is Failed,errorMsg: errCode: 12003 ,desc: bd15-21-32-217:9101_0 Failed  to async get EngineNode ErrorException: errCode: 0 ,desc: operation failed(操作失败)s!the reason(原因):EngineConnPluginNotFoundException: errCode: 70063 ,desc: No plugin foundhiv

############Execute Error!!!########

执行失败?什么问题。。。开始翻看日志 linkis-cg-engineplugin.log

2021-12-28 15:16:48.114 [ERROR] [message-executor_1                      ] c.w.w.l.m.s.DefaultMessageExecutor (131) [lambda$run$5] - method com.webank.wedatasphere.linkis.engineplugin.server.service.DefaultEngineConnResourceFactoryService.createEngineResource call failed java.lang.reflect.InvocationTargetException: null
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_181]
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_181]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_181]
        at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_181]
        at com.webank.wedatasphere.linkis.message.scheduler.AbstractMessageExecutor.lambda$run$5(AbstractMessageExecutor.java:126) ~[linkis-message-scheduler-1.0.2.jar:?]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_181]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_181]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
Caused by: java.util.concurrent.ExecutionException: LinkisException{errCode=70063, desc='No plugin foundhive-1.2.1please check your configuration', ip='bd15-21-32-217', port=9103, serviceKind='linkis-cg-engineplugin'}
        at com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:526) ~[guava-25.1-jre.jar:?]
        at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:487) ~[guava-25.1-jre.jar:?]
        at com.google.common.util.concurrent.AbstractFuture$TrustedFuture.get(AbstractFuture.java:83) ~[guava-25.1-jre.jar:?]
        at com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:196) ~[guava-25.1-jre.jar:?]
        at com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2311) ~[guava-25.1-jre.jar:?]
        at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2277) ~[guava-25.1-jre.jar:?]
        at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2154) ~[guava-25.1-jre.jar:?]
        at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2044) ~[guava-25.1-jre.jar:?]
        at com.google.common.cache.LocalCache.get(LocalCache.java:3951) ~[guava-25.1-jre.jar:?]
        at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4870) ~[guava-25.1-jre.jar:?]
        at com.webank.wedatasphere.linkis.manager.engineplugin.cache.GuavaEngineConnPluginCache.get(GuavaEngineConnPluginCache.java:110) ~[linkis-engineconn-plugin-cache-1.0.2.jar:?]
        at com.webank.wedatasphere.linkis.manager.engineplugin.manager.loaders.CacheablesEngineConnPluginLoader.getEngineConnPlugin(CacheablesEngineConnPluginLoader.java:65) ~[linkis-engineconn-plugin-loader-1.0.2.jar:?]
        at com.webank.wedatasphere.linkis.engineplugin.server.service.DefaultEngineConnResourceFactoryService.getResourceFactoryBy(DefaultEngineConnResourceFactoryService.scala:35) ~[linkis-engineconn-plugin-server-1.0.2.jar:?]
        at com.webank.wedatasphere.linkis.engineplugin.server.service.DefaultEngineConnResourceFactoryService.createEngineResource(DefaultEngineConnResourceFactoryService.scala:46) ~[linkis-engineconn-plugin-server-1.0.2.jar:?]
        ... 10 more

突然想到,哦!我把hive-1.2.1改成了1.1.0了。于是把cli脚本里的hive版本修改了一下,再次执行

[codeweaver@bd15-21-32-217 linkis]$ ./bin/linkis-cli-hive -code "SELECT * from mob_bg_devops.servers_exps_weekly_with_wh;" -submitUser codeweaver -proxyUser codeweaver
[INFO] LogFile path: /home/codeweaver/linkis/logs/linkis-cli//linkis-client.codeweaver.log.20211228152241983233114
[INFO] User does not provide usr-configuration file. Will use default config
[INFO] connecting to linkis gateway:http://127.0.0.1:9001
JobId:2
TaskId:2
ExecId:exec_id018019linkis-cg-entrancebd15-21-32-217:9104LINKISCLI_codeweaver_hive_1
[INFO] Job is successfully submitted!

2021-12-28 15:22:43.022 INFO Program is substituting variables for you
2021-12-28 15:22:43.022 INFO Variables substitution ended successfully
2021-12-28 15:22:43.022 WARN You submitted a sql without limit, DSS will add limit 5000 to your sql
2021-12-28 15:22:43.022 INFO SQL code check has passed
job is scheduled.
2021-12-28 15:22:44.022 INFO Your job is Scheduled. Please wait it to run.
Your job is being scheduled by orchestrator.
Job with jobId : LINKISCLI_codeweaver_hive_1 and execID : LINKISCLI_codeweaver_hive_1 submitted
2021-12-28 15:22:44.022 INFO You have submitted a new job, script code (after variable substitution) is
************************************SCRIPT CODE************************************
SELECT * from mob_bg_devops.servers_exps_weekly_with_wh limit 5000
************************************SCRIPT CODE************************************
2021-12-28 15:22:44.022 INFO Your job is accepted,  jobID is LINKISCLI_codeweaver_hive_1 and taskID is 2 in ServiceInstance(linkis-cg-entrance, bd15-21-32-217:9104). Please wait it to be scheduled
2021-12-28 15:22:44.022 INFO job is running.
2021-12-28 15:22:44.022 INFO Your job is Running now. Please wait it to complete.
Job with jobGroupId : 2 and subJobId : 2 was submitted to Orchestrator.
2021-12-28 15:22:44.022 INFO Background is starting a new engine for you, it may take several seconds, please wait
2021-12-28 15:22:50.022 ERROR Task is Failed,errorMsg: errCode: 12003 ,desc: bd15-21-32-217:9101_1 Failed  to async get EngineNode ErrorException: errCode: 0 ,desc: operation failed(操作失败)s!the reason(原因):ErrorException: errCode: 30000 ,desc: Necessary environment HADOOP_CONF_DIR is not exists!(必须的环境变量 HADOOP_CONF_DIR 不存在!) ,ip: bd15-21-32-217 ,port: 9102 ,serviceKind: linkis-cg-engineconnmanager ,ip: bd15-21-32-217 ,port: 9102 ,serviceKind: linkis-cg-engineconnmanager ,ip: bd15-21-32-217 ,port: 9104 ,serviceKind: linkis-cg-entrance
2021-12-28 15:22:50.022 INFO job is completed.
2021-12-28 15:22:50.022 INFO Task creation time(任务创建时间): 2021-12-28 15:22:43, Task scheduling time(任务调度时间): 2021-12-28 15:22:44, Task start time(任务开始时间): 2021-12-28 15:22:44, Mission end time(任务结束时间): 2021-12-28 15:22:50
2021-12-28 15:22:50.022 INFO Your mission(您的任务) 2 The total time spent is(总耗时时间为): 6.62021-12-28 15:22:50.022 INFO Sorry. Your job completed with a status Failed. You can view logs for the reason.

[INFO] Job failed! Will not try get execute result.
============Result:================
TaskId:2
ExecId: exec_id018019linkis-cg-entrancebd15-21-32-217:9104LINKISCLI_codeweaver_hive_1
User:codeweaver
Current job status:FAILED
extraMsg:
errDesc: 21304, Task is Failed,errorMsg: errCode: 12003 ,desc: bd15-21-32-217:9101_1 Failed  to async get EngineNode ErrorException: errCode: 0 ,desc: operation failed(操作失败)s!the reason(原因):ErrorException: errCode: 30000 ,desc: Necessary environment HADOOP_CONF_DIR

############Execute Error!!!########

再次报错,这次提示是HADOOP_CONF_DIR的问题。去日志里验证一下,发现报错是在linkis-cg-engineconnmanager.log。看来已经通过了引擎路由,是具体引擎执行端报的错。

Caused by: com.webank.wedatasphere.linkis.common.exception.ErrorException: errCode: 30000 ,desc: Necessary environment HADOOP_CONF_DIR is not exists!(必须的环境变量 HADOOP_CONF_DIR 不存在!) ,ip: bd15-21-32-217 ,port: 9102 ,serviceKind: linkis-cg-engineconnmanager

看了下linkis-env.sh文件,明明已经配置上了,为什么没有读取到呢?是文件权限的问题吗?把文件权限改成777再次尝试,发现之前的报错信息消失了,但是依旧执行失败。再次查看日志,发现还是HADOOP_CONF_DIR的问题。不死心的我重启了所有应用,发现还是有问题。没办法,只能去代码里找答案了。

override def launch(): Unit = {
  request.necessaryEnvironments.foreach{e =>
    val env = CommonVars(e, "")
    if(StringUtils.isEmpty(env.getValue))
      throw new ErrorException(30000, s"Necessary environment $e is not exists!(必须的环境变量 $e 不存在!)") //TODO exception
    else request.environment.put(e, env.getValue)
  }
  prepareCommand()
  val exec = newProcessEngineConnCommandExec(sudoCommand(request.user, execFile.mkString(" ")), engineConnManagerEnv.engineConnWorkDir)
  exec.execute()
  process = exec.getProcess
}

这个TODO让我感到有点慌。。。报错是在这里,那么环境变量从哪里加载的呢?一直追溯到JavaProcessEngineConnLaunchBuilder,发现

if(ifAddHiveConfigPath) {
  addPathToClassPath(environment, variable(HADOOP_CONF_DIR))
  addPathToClassPath(environment, variable(HIVE_CONF_DIR))
}
def addPathToClassPath(env: java.util.Map[String, String], value: String): Unit = {
  val v = if(env.containsKey(Environment.CLASSPATH.toString)) {
    env.get(Environment.CLASSPATH.toString) + CLASS_PATH_SEPARATOR + value
  } else value
  env.put(Environment.CLASSPATH.toString, v)
}

难道是要加在profile里面?赶紧尝试一下,在/etc/profile中加上

###HADOOP CONF DIR
export HADOOP_CONF_DIR=/etc/hadoop/conf

###HIVE CONF DIR
export HIVE_CONF_DIR=/etc/hive/conf

###SPARK CONF DIR
export SPARK_CONF_DIR=/opt/mobdata/spark/spark-2.4.3.mob1-bin-2.6.5/conf

再试一次,之前的错误消失了,但是又是新的问题

[codeweaver@bd15-21-32-217 bin]$ ./linkis-cli-hive -code "SELECT * from mob_bg_devops.servers_exps_weekly_with_wh;" -submitUser codeweaver -proxyUser codeweaver
[INFO] LogFile path: /home/codeweaver/linkis/logs/linkis-cli//linkis-client.codeweaver.log.20211228162640335698166
[INFO] User does not provide usr-configuration file. Will use default config
[INFO] connecting to linkis gateway:http://127.0.0.1:9001
JobId:10
TaskId:10
ExecId:exec_id018019linkis-cg-entrancebd15-21-32-217:9104LINKISCLI_codeweaver_hive_5
[INFO] Job is successfully submitted!

2021-12-28 16:26:42.026 INFO Program is substituting variables for you
2021-12-28 16:26:42.026 INFO Variables substitution ended successfully
2021-12-28 16:26:42.026 WARN You submitted a sql without limit, DSS will add limit 5000 to your sql
2021-12-28 16:26:42.026 INFO SQL code check has passed
job is scheduled.
2021-12-28 16:26:42.026 INFO Your job is Scheduled. Please wait it to run.
Your job is being scheduled by orchestrator.
Job with jobId : LINKISCLI_codeweaver_hive_5 and execID : LINKISCLI_codeweaver_hive_5 submitted
2021-12-28 16:26:42.026 INFO You have submitted a new job, script code (after variable substitution) is
************************************SCRIPT CODE************************************
SELECT * from mob_bg_devops.servers_exps_weekly_with_wh limit 5000
************************************SCRIPT CODE************************************
2021-12-28 16:26:42.026 INFO Your job is accepted,  jobID is LINKISCLI_codeweaver_hive_5 and taskID is 10 in ServiceInstance(linkis-cg-entrance, bd15-21-32-217:9104). Please wait it to be scheduled
2021-12-28 16:26:42.026 INFO job is running.
2021-12-28 16:26:42.026 INFO Your job is Running now. Please wait it to complete.
Job with jobGroupId : 10 and subJobId : 10 was submitted to Orchestrator.
2021-12-28 16:26:42.026 INFO Background is starting a new engine for you, it may take several seconds, please wait
2021-12-28 16:26:43.026 INFO Retry---success to rebuild task node:astJob_5_codeExec_5, ready to execute new retry-task:astJob_5_retry_30, current age is 1
2021-12-28 16:26:53.026 INFO Background is starting a new engine for you, it may take several seconds, please wait
2021-12-28 16:26:54.026 INFO Retry---success to rebuild task node:astJob_5_retry_30, ready to execute new retry-task:astJob_5_retry_30, current age is 2
2021-12-28 16:27:04.027 INFO Background is starting a new engine for you, it may take several seconds, please wait
2021-12-28 16:27:04.027 INFO Retry---success to rebuild task node:astJob_5_retry_31, ready to execute new retry-task:astJob_5_retry_31, current age is 3
2021-12-28 16:27:14.027 INFO Background is starting a new engine for you, it may take several seconds, please wait
2021-12-28 16:27:15.027 INFO Retry---success to rebuild task node:astJob_5_retry_32, ready to execute new retry-task:astJob_5_retry_32, current age is 4
2021-12-28 16:27:25.027 INFO Background is starting a new engine for you, it may take several seconds, please wait
2021-12-28 16:27:26.027 INFO Retry---success to rebuild task node:astJob_5_retry_33, ready to execute new retry-task:astJob_5_retry_33, current age is 5
2021-12-28 16:27:36.027 INFO Background is starting a new engine for you, it may take several seconds, please wait
2021-12-28 16:27:37.027 INFO Retry---success to rebuild task node:astJob_5_retry_34, ready to execute new retry-task:astJob_5_retry_34, current age is 6
2021-12-28 16:27:47.027 INFO Background is starting a new engine for you, it may take several seconds, please wait
2021-12-28 16:27:47.027 INFO Retry---success to rebuild task node:astJob_5_retry_35, ready to execute new retry-task:astJob_5_retry_35, current age is 7
2021-12-28 16:27:57.027 INFO Background is starting a new engine for you, it may take several seconds, please wait
2021-12-28 16:27:58.027 INFO Retry---success to rebuild task node:astJob_5_retry_36, ready to execute new retry-task:astJob_5_retry_36, current age is 8
2021-12-28 16:28:08.028 INFO Background is starting a new engine for you, it may take several seconds, please wait
2021-12-28 16:28:09.028 INFO Retry---success to rebuild task node:astJob_5_retry_37, ready to execute new retry-task:astJob_5_retry_37, current age is 9
2021-12-28 16:28:19.028 INFO Background is starting a new engine for you, it may take several seconds, please wait
2021-12-28 16:28:20.028 INFO Retry---success to rebuild task node:astJob_5_retry_38, ready to execute new retry-task:astJob_5_retry_38, current age is 10
2021-12-28 16:28:30.028 INFO Background is starting a new engine for you, it may take several seconds, please wait
2021-12-28 16:28:30.028 ERROR Task is Failed,errorMsg: ask Engine failed + errCode: 12003 ,desc: bd15-21-32-217:9101_89 Failed  to async get EngineNodeLinkisRetryException: errCode: 30002 ,desc: 资源不足,请重试: errCode: 11012 ,desc: CPU resources are insufficient, to reduce the number of driver cores(CPU资源不足,建议调小驱动核数) ,ip: bd15-21-32-217 ,port: 9101 ,serviceKind: linkis-cg-linkismanager ,ip: bd15-21-32-217 ,port: 9101 ,serviceKind: linkis-cg-linkismanager ,ip: bd15-21-32-217 ,port: 9104 ,serviceKind: linkis-cg-entrance
2021-12-28 16:28:31.028 INFO job is completed.
2021-12-28 16:28:31.028 INFO Task creation time(任务创建时间): 2021-12-28 16:26:41, Task scheduling time(任务调度时间): 2021-12-28 16:26:42, Task start time(任务开始时间): 2021-12-28 16:26:42, Mission end time(任务结束时间): 2021-12-28 16:28:31
2021-12-28 16:28:31.028 INFO Your mission(您的任务) 10 The total time spent is(总耗时时间为): 1.8 分钟
2021-12-28 16:28:31.028 INFO Sorry. Your job completed with a status Failed. You can view logs for the reason.

[INFO] Job failed! Will not try get execute result.
============Result:================
TaskId:10
ExecId: exec_id018019linkis-cg-entrancebd15-21-32-217:9104LINKISCLI_codeweaver_hive_5
User:codeweaver
Current job status:FAILED
extraMsg:
errCode: 11012
errDesc: 远程服务器CPU资源不足

############Execute Error!!!########

继续查看日志,发现语句以及提交上去了,但是最终结果不是SUCCEED

---------------------------------------------------
	task 10 status is RUNNING, progress : 0.0
---------------------------------------------------
2021-12-28 16:28:28,720 INFO LinkisJobLogPresenter(89) - Job is still running, status=RUNNING, progress=0.0%
2021-12-28 16:28:30,710 INFO LinkisSubmitExecutor(101) -
---------------------------------------------------
	task 10 status is RUNNING, progress : 0.0
---------------------------------------------------
2021-12-28 16:28:32,743 INFO LinkisSubmitExecutor(101) -
---------------------------------------------------
	task 10 status is FAILED, progress : 1.0
---------------------------------------------------
2021-12-28 16:28:34,774 WARN SyncSubmission(154) - Exception thrown when trying to query final result. Status will change to FAILED
com.webank.wedatasphere.linkis.cli.core.exception.ExecutorException: EXE0021,Error occured during execution: Get ResultSet Failed: job Status is not "Succeed", .
	at com.webank.wedatasphere.linkis.cli.application.driver.UjesClientDriver.queryResultSetPaths(UjesClientDriver.java:428) ~[linkis-cli-application-1.0.2.jar:?]
	at com.webank.wedatasphere.linkis.cli.application.interactor.execution.executor.LinkisSubmitExecutor.doGetFinalResult(LinkisSubmitExecutor.java:173) ~[linkis-cli-application-1.0.2.jar:?]
	at com.webank.wedatasphere.linkis.cli.core.interactor.execution.SyncSubmission.ExecWithAsyncBackend(SyncSubmission.java:152) [linkis-cli-core-1.0.2.jar:?]
	at com.webank.wedatasphere.linkis.cli.core.interactor.execution.SyncSubmission.execute(SyncSubmission.java:76) [linkis-cli-core-1.0.2.jar:?]
	at com.webank.wedatasphere.linkis.cli.application.LinkisClientApplication.exec(LinkisClientApplication.java:349) [linkis-cli-application-1.0.2.jar:?]
	at com.webank.wedatasphere.linkis.cli.application.LinkisClientApplication.main(LinkisClientApplication.java:381) [linkis-cli-application-1.0.2.jar:?]
2021-12-28 16:28:34,804 INFO LinkisSubmitExecutor(101) -
---------------------------------------------------
	task 10 status is FAILED, progress : 1.0
---------------------------------------------------
2021-12-28 16:28:35,285 INFO LinkisJobLogPresenter(89) - Job is still running, status=FAILED, progress=100.0%
2021-12-28 16:28:38,806 INFO LinkisJobResultPresenter(57) - Job status is not success but 'FAILED'. Will not try to retrieve any Result

怎么解决呢。。。我仔细翻看了各个日志,终于发现了关键

linkis-cg-engineconnmanager.out

54e1b9c0-d4dc-4be9-a49c-4b5f3597f9c8:sudo: sorry, you must have a tty to run sudo

是不是sudo的配置问题?

vi /etc/sudoers (最好用visudo命令)

注释掉 Default requiretty 一行

#Default requiretty

保存,重新执行

[codeweaver@bd15-21-32-217 bin]$ ./linkis-cli-hive -code "SELECT * from mob_bg_devops.servers_exps_weekly_with_wh;" -submitUser codeweaver -proxyUser codeweaver
[INFO] LogFile path: /home/codeweaver/linkis/logs/linkis-cli//linkis-client.codeweaver.log.20211228183116651149061
[INFO] User does not provide usr-configuration file. Will use default config
[INFO] connecting to linkis gateway:http://127.0.0.1:9001
JobId:22
TaskId:22
ExecId:exec_id018019linkis-cg-entrancebd15-21-32-217:9104LINKISCLI_codeweaver_hive_0
[INFO] Job is successfully submitted!

2021-12-28 18:31:19.031 INFO Program is substituting variables for you
2021-12-28 18:31:19.031 INFO Variables substitution ended successfully
2021-12-28 18:31:20.031 WARN You submitted a sql without limit, DSS will add limit 5000 to your sql
2021-12-28 18:31:20.031 INFO SQL code check has passed
job is scheduled.
2021-12-28 18:31:21.031 INFO Your job is Scheduled. Please wait it to run.
Job with jobId : LINKISCLI_codeweaver_hive_0 and execID : LINKISCLI_codeweaver_hive_0 submitted
Your job is being scheduled by orchestrator.
2021-12-28 18:31:21.031 INFO You have submitted a new job, script code (after variable substitution) is
************************************SCRIPT CODE************************************
SELECT * from mob_bg_devops.servers_exps_weekly_with_wh limit 5000
************************************SCRIPT CODE************************************
2021-12-28 18:31:21.031 INFO Your job is accepted,  jobID is LINKISCLI_codeweaver_hive_0 and taskID is 22 in ServiceInstance(linkis-cg-entrance, bd15-21-32-217:9104). Please wait it to be scheduled
2021-12-28 18:31:21.031 INFO job is running.
2021-12-28 18:31:21.031 INFO Your job is Running now. Please wait it to complete.
Job with jobGroupId : 22 and subJobId : 22 was submitted to Orchestrator.
2021-12-28 18:31:21.031 INFO Background is starting a new engine for you, it may take several seconds, please wait
2021-12-28 18:31:43.031 INFO EngineConn local log path: ServiceInstance(linkis-cg-engineconn, bd15-21-32-217:26052) /tmp/codeweaver/linkis_dev/codeweaver/workDir/1c3da121-8e1e-4b3f-bbb9-1e09876ae96c/logs
HiveEngineExecutor_0 >> SELECT * from mob_bg_devops.servers_exps_weekly_with_wh limit 5000
2021-12-28 18:31:44.383 ERROR [Linkis-Default-Scheduler-Thread-3] com.webank.wedatasphere.linkis.engineplugin.hive.executor.HiveEngineConnExecutor 200 com$webank$wedatasphere$linkis$engineplugin$hive$executor$HiveEngineConnExecutor$$executeHQL - query failed, reason : java.lang.reflect.InvocationTargetException: null
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_181]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_181]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_181]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_181]
	at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_181]
	at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_181]
	at scala.collection.immutable.Range.foreach(Range.scala:160) [scala-library-2.11.12.jar:?]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_181]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_181]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_181]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [?:1.8.0_181]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
Caused by: java.lang.NoClassDefFoundError: org/apache/zookeeper/KeeperException$NoNodeException
	at java.lang.Class.forName0(Native Method) ~[?:1.8.0_181]
	at java.lang.Class.forName(Class.java:348) ~[?:1.8.0_181]
	... 43 more
	at java.net.URLClassLoader.findClass(URLClassLoader.java:381) ~[?:1.8.0_181]
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424) ~[?:1.8.0_181]
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) ~[?:1.8.0_181]
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ~[?:1.8.0_181]
	at java.lang.Class.forName0(Native Method) ~[?:1.8.0_181]
	at java.lang.Class.forName(Class.java:348) ~[?:1.8.0_181]
	... 43 more
2021-12-28 18:31:44.410 ERROR [Linkis-Default-Scheduler-Thread-3] com.webank.wedatasphere.linkis.engineplugin.hive.executor.HiveEngineConnExecutor 57 error - execute code failed! java.lang.reflect.InvocationTargetException: null
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_181]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_181]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_181]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_181]
	at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_181]
	at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_181]
	at scala.collection.immutable.Range.foreach(Range.scala:160) [scala-library-2.11.12.jar:?]
	at com.webank.wedatasphere.linkis.engineconn.acessible.executor.entity.AccessibleExecutor.ensureIdle(AccessibleExecutor.scala:54) [linkis-accessible-executor-1.0.2.jar:?]
	at com.webank.wedatasphere.linkis.engineconn.acessible.executor.entity.AccessibleExecutor.ensureIdle(AccessibleExecutor.scala:48) [linkis-accessible-executor-1.0.2.jar:?]
	at com.webank.wedatasphere.linkis.engineconn.computation.executor.execute.ComputationExecutor.ensureOp(ComputationExecutor.scala:133) [linkis-computation-engineconn-1.0.2.jar:?]
	at com.webank.wedatasphere.linkis.engineconn.computation.executor.execute.ComputationExecutor.execute(ComputationExecutor.scala:236) [linkis-computation-engineconn-1.0.2.jar:?]
	at com.webank.wedatasphere.linkis.engineconn.computation.executor.service.TaskExecutionServiceImpl.com$webank$wedatasphere$linkis$engineconn$computation$executor$service$TaskExecutionServiceImpl$$executeTask(TaskExecutionServiceImpl.scala:239) [linkis-computation-engineconn-1.0.2.jar:?]
	at com.webank.wedatasphere.linkis.engineconn.computation.executor.service.TaskExecutionServiceImpl$$anon$1$$anonfun$run$1.apply$mcV$sp(TaskExecutionServiceImpl.scala:172) [linkis-computation-engineconn-1.0.2.jar:?]
	at com.webank.wedatasphere.linkis.engineconn.computation.executor.service.TaskExecutionServiceImpl$$anon$1$$anonfun$run$1.apply(TaskExecutionServiceImpl.scala:170) [linkis-computation-engineconn-1.0.2.jar:?]
	at com.webank.wedatasphere.linkis.engineconn.computation.executor.service.TaskExecutionServiceImpl$$anon$1$$anonfun$run$1.apply(TaskExecutionServiceImpl.scala:170) [linkis-computation-engineconn-1.0.2.jar:?]
	at com.webank.wedatasphere.linkis.common.utils.Utils$.tryCatch(Utils.scala:39) [linkis-common-1.0.2.jar:?]
	at com.webank.wedatasphere.linkis.common.utils.Utils$.tryAndWarn(Utils.scala:68) [linkis-common-1.0.2.jar:?]
	at com.webank.wedatasphere.linkis.engineconn.computation.executor.service.TaskExecutionServiceImpl$$anon$1.run(TaskExecutionServiceImpl.scala:170) [linkis-computation-engineconn-1.0.2.jar:?]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_181]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_181]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_181]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [?:1.8.0_181]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
Caused by: java.lang.NoClassDefFoundError: org/apache/zookeeper/KeeperException$NoNodeException
	at java.lang.Class.forName0(Native Method) ~[?:1.8.0_181]
	at java.lang.Class.forName(Class.java:348) ~[?:1.8.0_181]
	at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2013) ~[hadoop-common-2.6.0.jar:?]
	at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1978) ~[hadoop-common-2.6.0.jar:?]
	at org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager.getLockManager(DummyTxnManager.java:70) ~[hive-exec-1.1.0.jar:1.1.0]
	at org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager.acquireLocks(DummyTxnManager.java:101) ~[hive-exec-1.1.0.jar:1.1.0]
	at org.apache.hadoop.hive.ql.Driver.acquireLocksAndOpenTxn(Driver.java:984) ~[hive-exec-1.1.0.jar:1.1.0]
	at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1172) ~[hive-exec-1.1.0.jar:1.1.0]
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049) ~[hive-exec-1.1.0.jar:1.1.0]
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1039) ~[hive-exec-1.1.0.jar:1.1.0]
	... 43 more
Caused by: java.lang.ClassNotFoundException: org.apache.zookeeper.KeeperException$NoNodeException
	at java.net.URLClassLoader.findClass(URLClassLoader.java:381) ~[?:1.8.0_181]
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424) ~[?:1.8.0_181]
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) ~[?:1.8.0_181]
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ~[?:1.8.0_181]
	at java.lang.Class.forName0(Native Method) ~[?:1.8.0_181]
	at java.lang.Class.forName(Class.java:348) ~[?:1.8.0_181]
	at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2013) ~[hadoop-common-2.6.0.jar:?]
	at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1978) ~[hadoop-common-2.6.0.jar:?]
	at org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager.getLockManager(DummyTxnManager.java:70) ~[hive-exec-1.1.0.jar:1.1.0]
	at org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager.acquireLocks(DummyTxnManager.java:101) ~[hive-exec-1.1.0.jar:1.1.0]
	at org.apache.hadoop.hive.ql.Driver.acquireLocksAndOpenTxn(Driver.java:984) ~[hive-exec-1.1.0.jar:1.1.0]
	at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1172) ~[hive-exec-1.1.0.jar:1.1.0]
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049) ~[hive-exec-1.1.0.jar:1.1.0]
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1039) ~[hive-exec-1.1.0.jar:1.1.0]
	... 43 more
2021-12-28 18:31:44.428 ERROR [Linkis-Default-Scheduler-Thread-3] com.webank.wedatasphere.linkis.engineconn.computation.executor.service.TaskExecutionServiceImpl 57 error - null java.lang.reflect.InvocationTargetException: null
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_181]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_181]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_181]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_181]
	at com.webank.wedatasphere.linkis.engineplugin.hive.executor.HiveDriverProxy.run(HiveEngineConnExecutor.scala:456) ~[linkis-engineplugin-hive-1.0.2.jar:?]
	at com.webank.wedatasphere.linkis.engineplugin.hive.executor.HiveEngineConnExecutor.com$webank$wedatasphere$linkis$engineplugin$hive$executor$HiveEngineConnExecutor$$executeHQL(HiveEngineConnExecutor.scala:163) ~[linkis-engineplugin-hive-1.0.2.jar:?]
	at com.webank.wedatasphere.linkis.engineplugin.hive.executor.HiveEngineConnExecutor$$anon$1.run(HiveEngineConnExecutor.scala:127) ~[linkis-engineplugin-hive-1.0.2.jar:?]
	at com.webank.wedatasphere.linkis.engineplugin.hive.executor.HiveEngineConnExecutor$$anon$1.run(HiveEngineConnExecutor.scala:120) ~[linkis-engineplugin-hive-1.0.2.jar:?]
	at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_181]
	at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_181]
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) ~[hadoop-common-2.6.0.jar:?]
	at com.webank.wedatasphere.linkis.engineplugin.hive.executor.HiveEngineConnExecutor.executeLine(HiveEngineConnExecutor.scala:120) ~[linkis-engineplugin-hive-1.0.2.jar:?]
	at com.webank.wedatasphere.linkis.engineconn.computation.executor.execute.ComputationExecutor$$anonfun$toExecuteTask$2$$anonfun$apply$10$$anonfun$apply$11.apply(ComputationExecutor.scala:179) ~[linkis-computation-engineconn-1.0.2.jar:?]
	at com.webank.wedatasphere.linkis.engineconn.computation.executor.execute.ComputationExecutor$$anonfun$toExecuteTask$2$$anonfun$apply$10$$anonfun$apply$11.apply(ComputationExecutor.scala:178) ~[linkis-computation-engineconn-1.0.2.jar:?]
	at com.webank.wedatasphere.linkis.common.utils.Utils$.tryCatch(Utils.scala:39) ~[linkis-common-1.0.2.jar:?]
	at com.webank.wedatasphere.linkis.engineconn.computation.executor.execute.ComputationExecutor$$anonfun$toExecuteTask$2$$anonfun$apply$10.apply(ComputationExecutor.scala:180) ~[linkis-computation-engineconn-1.0.2.jar:?]
	at com.webank.wedatasphere.linkis.engineconn.computation.executor.execute.ComputationExecutor$$anonfun$toExecuteTask$2$$anonfun$apply$10.apply(ComputationExecutor.scala:174) ~[linkis-computation-engineconn-1.0.2.jar:?]
	at scala.collection.immutable.Range.foreach(Range.scala:160) ~[scala-library-2.11.12.jar:?]
	at com.webank.wedatasphere.linkis.engineconn.computation.executor.execute.ComputationExecutor$$anonfun$toExecuteTask$2.apply(ComputationExecutor.scala:173) ~[linkis-computation-engineconn-1.0.2.jar:?]
	at com.webank.wedatasphere.linkis.engineconn.computation.executor.execute.ComputationExecutor$$anonfun$toExecuteTask$2.apply(ComputationExecutor.scala:149) ~[linkis-computation-engineconn-1.0.2.jar:?]
	at com.webank.wedatasphere.linkis.common.utils.Utils$.tryFinally(Utils.scala:60) ~[linkis-common-1.0.2.jar:?]
	at com.webank.wedatasphere.linkis.engineconn.computation.executor.execute.ComputationExecutor.toExecuteTask(ComputationExecutor.scala:222) ~[linkis-computation-engineconn-1.0.2.jar:?]
	at com.webank.wedatasphere.linkis.engineconn.computation.executor.execute.ComputationExecutor$$anonfun$3.apply(ComputationExecutor.scala:237) ~[linkis-computation-engineconn-1.0.2.jar:?]
	at com.webank.wedatasphere.linkis.engineconn.computation.executor.execute.ComputationExecutor$$anonfun$3.apply(ComputationExecutor.scala:237) ~[linkis-computation-engineconn-1.0.2.jar:?]
	at com.webank.wedatasphere.linkis.common.utils.Utils$.tryFinally(Utils.scala:60) ~[linkis-common-1.0.2.jar:?]
	at com.webank.wedatasphere.linkis.engineconn.acessible.executor.entity.AccessibleExecutor.ensureIdle(AccessibleExecutor.scala:54) ~[linkis-accessible-executor-1.0.2.jar:?]
	at com.webank.wedatasphere.linkis.engineconn.acessible.executor.entity.AccessibleExecutor.ensureIdle(AccessibleExecutor.scala:48) ~[linkis-accessible-executor-1.0.2.jar:?]
	at com.webank.wedatasphere.linkis.engineconn.computation.executor.execute.ComputationExecutor.ensureOp(ComputationExecutor.scala:133) ~[linkis-computation-engineconn-1.0.2.jar:?]
	at com.webank.wedatasphere.linkis.engineconn.computation.executor.execute.ComputationExecutor.execute(ComputationExecutor.scala:236) ~[linkis-computation-engineconn-1.0.2.jar:?]
	at com.webank.wedatasphere.linkis.engineconn.computation.executor.service.TaskExecutionServiceImpl.com$webank$wedatasphere$linkis$engineconn$computation$executor$service$TaskExecutionServiceImpl$$executeTask(TaskExecutionServiceImpl.scala:239) [linkis-computation-engineconn-1.0.2.jar:?]
	at com.webank.wedatasphere.linkis.engineconn.computation.executor.service.TaskExecutionServiceImpl$$anon$1$$anonfun$run$1.apply$mcV$sp(TaskExecutionServiceImpl.scala:172) [linkis-computation-engineconn-1.0.2.jar:?]
	at com.webank.wedatasphere.linkis.engineconn.computation.executor.service.TaskExecutionServiceImpl$$anon$1$$anonfun$run$1.apply(TaskExecutionServiceImpl.scala:170) [linkis-computation-engineconn-1.0.2.jar:?]
	at com.webank.wedatasphere.linkis.engineconn.computation.executor.service.TaskExecutionServiceImpl$$anon$1$$anonfun$run$1.apply(TaskExecutionServiceImpl.scala:170) [linkis-computation-engineconn-1.0.2.jar:?]
	at com.webank.wedatasphere.linkis.common.utils.Utils$.tryCatch(Utils.scala:39) [linkis-common-1.0.2.jar:?]
	at com.webank.wedatasphere.linkis.common.utils.Utils$.tryAndWarn(Utils.scala:68) [linkis-common-1.0.2.jar:?]
	at com.webank.wedatasphere.linkis.engineconn.computation.executor.service.TaskExecutionServiceImpl$$anon$1.run(TaskExecutionServiceImpl.scala:170) [linkis-computation-engineconn-1.0.2.jar:?]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_181]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_181]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_181]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [?:1.8.0_181]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
Caused by: java.lang.NoClassDefFoundError: org/apache/zookeeper/KeeperException$NoNodeException
	at java.lang.Class.forName0(Native Method) ~[?:1.8.0_181]
	at java.lang.Class.forName(Class.java:348) ~[?:1.8.0_181]
	at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2013) ~[hadoop-common-2.6.0.jar:?]
	at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1978) ~[hadoop-common-2.6.0.jar:?]
	at org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager.getLockManager(DummyTxnManager.java:70) ~[hive-exec-1.1.0.jar:1.1.0]
	at org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager.acquireLocks(DummyTxnManager.java:101) ~[hive-exec-1.1.0.jar:1.1.0]
	at org.apache.hadoop.hive.ql.Driver.acquireLocksAndOpenTxn(Driver.java:984) ~[hive-exec-1.1.0.jar:1.1.0]
	at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1172) ~[hive-exec-1.1.0.jar:1.1.0]
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049) ~[hive-exec-1.1.0.jar:1.1.0]
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1039) ~[hive-exec-1.1.0.jar:1.1.0]
	... 43 more
Caused by: java.lang.ClassNotFoundException: org.apache.zookeeper.KeeperException$NoNodeException
	at java.net.URLClassLoader.findClass(URLClassLoader.java:381) ~[?:1.8.0_181]
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424) ~[?:1.8.0_181]
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) ~[?:1.8.0_181]
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ~[?:1.8.0_181]
	at java.lang.Class.forName0(Native Method) ~[?:1.8.0_181]
	at java.lang.Class.forName(Class.java:348) ~[?:1.8.0_181]
	at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2013) ~[hadoop-common-2.6.0.jar:?]
	at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1978) ~[hadoop-common-2.6.0.jar:?]
	at org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager.getLockManager(DummyTxnManager.java:70) ~[hive-exec-1.1.0.jar:1.1.0]
	at org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager.acquireLocks(DummyTxnManager.java:101) ~[hive-exec-1.1.0.jar:1.1.0]
	at org.apache.hadoop.hive.ql.Driver.acquireLocksAndOpenTxn(Driver.java:984) ~[hive-exec-1.1.0.jar:1.1.0]
	at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1172) ~[hive-exec-1.1.0.jar:1.1.0]
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049) ~[hive-exec-1.1.0.jar:1.1.0]
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1039) ~[hive-exec-1.1.0.jar:1.1.0]
	... 43 more
2021-12-28 18:31:44.031 ERROR Task is Failed,errorMsg: null
2021-12-28 18:31:44.031 INFO job is completed.
2021-12-28 18:31:44.031 INFO Task creation time(任务创建时间): 2021-12-28 18:31:19, Task scheduling time(任务调度时间): 2021-12-28 18:31:21, Task start time(任务开始时间): 2021-12-28 18:31:21, Mission end time(任务结束时间): 2021-12-28 18:31:44
2021-12-28 18:31:44.031 INFO Your mission(您的任务) 22 The total time spent is(总耗时时间为): 25.62021-12-28 18:31:44.031 INFO Sorry. Your job completed with a status Failed. You can view logs for the reason.

[INFO] Job failed! Will not try get execute result.
============Result:================
TaskId:22
ExecId: exec_id018019linkis-cg-entrancebd15-21-32-217:9104LINKISCLI_codeweaver_hive_0
User:codeweaver
Current job status:FAILED
extraMsg:
errDesc: 21304, Task is Failed,errorMsg: null

############Execute Error!!!########

??? 新问题?看上去是依赖冲突,把冲突的包替换掉应该就没问题了吧。

不过包是在哪里引用的呢?暂时没有头绪,先试下一个引擎吧

spark引擎

💡 直接上坑

[codeweaver@bd15-21-32-217 bin]$ ./linkis-cli-spark-sql -code "SELECT * from mob_bg_devops.servers_exps_weekly_with_wh;"  -submitUser codeweaver -proxyUser codeweaver
[INFO] LogFile path: /home/codeweaver/linkis/logs/linkis-cli//linkis-client.codeweaver.log.20211228174733617974682
[INFO] User does not provide usr-configuration file. Will use default config
[INFO] connecting to linkis gateway:http://127.0.0.1:9001
JobId:3
TaskId:3
ExecId:exec_id018019linkis-cg-entrancebd15-21-32-217:9104LINKISCLI_codeweaver_spark_1
[INFO] Job is successfully submitted!

2021-12-28 17:47:35.047 INFO Program is substituting variables for you
2021-12-28 17:47:35.047 INFO Variables substitution ended successfully
2021-12-28 17:47:35.047 WARN You submitted a sql without limit, DSS will add limit 5000 to your sql
2021-12-28 17:47:35.047 INFO SQL code check has passed
job is scheduled.
2021-12-28 17:47:36.047 INFO Your job is Scheduled. Please wait it to run.
Your job is being scheduled by orchestrator.
Job with jobId : LINKISCLI_codeweaver_spark_1 and execID : LINKISCLI_codeweaver_spark_1 submitted
2021-12-28 17:47:36.047 INFO You have submitted a new job, script code (after variable substitution) is
************************************SCRIPT CODE************************************
SELECT * from mob_bg_devops.servers_exps_weekly_with_wh limit 5000
************************************SCRIPT CODE************************************
2021-12-28 17:47:36.047 INFO Your job is accepted,  jobID is LINKISCLI_codeweaver_spark_1 and taskID is 3 in ServiceInstance(linkis-cg-entrance, bd15-21-32-217:9104). Please wait it to be scheduled
2021-12-28 17:47:36.047 INFO job is running.
2021-12-28 17:47:36.047 INFO Your job is Running now. Please wait it to complete.
Job with jobGroupId : 3 and subJobId : 3 was submitted to Orchestrator.
2021-12-28 17:47:36.047 INFO Background is starting a new engine for you, it may take several seconds, please wait
2021-12-28 17:47:36.047 ERROR Task is Failed,errorMsg: errCode: 12003 ,desc: bd15-21-32-217:9101_2 Failed  to async get EngineNode RMErrorException: errCode: 11006 ,desc: Failed to request external resourceRMWarnException: errCode: 11006 ,desc: queue ide is not exists in YARN. ,ip: bd15-21-32-217 ,port: 9101 ,serviceKind: linkis-cg-linkismanager ,ip: bd15-21-32-217 ,port: 9101 ,serviceKind: linkis-cg-linkismanager ,ip: bd15-21-32-217 ,port: 9104 ,serviceKind: linkis-cg-entrance
2021-12-28 17:47:37.047 INFO job is completed.
2021-12-28 17:47:37.047 INFO Task creation time(任务创建时间): 2021-12-28 17:47:35, Task scheduling time(任务调度时间): 2021-12-28 17:47:36, Task start time(任务开始时间): 2021-12-28 17:47:36, Mission end time(任务结束时间): 2021-12-28 17:47:37
2021-12-28 17:47:37.047 INFO Your mission(您的任务) 3 The total time spent is(总耗时时间为): 1.82021-12-28 17:47:37.047 INFO Sorry. Your job completed with a status Failed. You can view logs for the reason.

[INFO] Job failed! Will not try get execute result.
============Result:================
TaskId:3
ExecId: exec_id018019linkis-cg-entrancebd15-21-32-217:9104LINKISCLI_codeweaver_spark_1
User:codeweaver
Current job status:FAILED
extraMsg:
errCode: 10001
errDesc: 会话创建失败,ide队列不存在,请检查队列设置是否正确

############Execute Error!!!########

好吧看上去是队列问题,默认调用了ide队列

2021-12-28 17:47:36.610 [INFO ] [ForkJoinPool-1-worker-7                 ] c.w.w.l.m.a.s.e.DefaultEngineAskEngineService (45) [info] - Failed  to async(bd15-21-32-217:9101_2) createEngine com.webank.wedatasphere.linkis.resourcemanager.exception.RMErrorException: errCode: 11006 ,desc: Failed to request external resourceRMWarnException: errCode: 11006 ,desc: queue ide is not exists in YARN. ,ip: bd15-21-32-217 ,port: 9101 ,serviceKind: linkis-cg-linkismanager ,ip: bd15-21-32-217 ,port: 9101 ,serviceKind: linkis-cg-linkismanager

加上队列再试下

[codeweaver@bd15-21-32-217 bin]$ ./linkis-cli-spark-sql -code "SELECT * from mob_bg_devops.servers_exps_weekly_with_wh;" -submitUser codeweaver -proxyUser codeweaver --queue default
[INFO] LogFile path: /home/codeweaver/linkis/logs/linkis-cli//linkis-client.codeweaver.log.20211228185636659565504
[INFO] User does not provide usr-configuration file. Will use default config
[INFO] connecting to linkis gateway:http://127.0.0.1:9001
JobId:27
TaskId:27
ExecId:exec_id018019linkis-cg-entrancebd15-21-32-217:9104LINKISCLI_codeweaver_spark_0
[INFO] Job is successfully submitted!

2021-12-28 18:56:38.056 INFO Program is substituting variables for you
2021-12-28 18:56:38.056 INFO Variables substitution ended successfully
2021-12-28 18:56:38.056 WARN You submitted a sql without limit, DSS will add limit 5000 to your sql
2021-12-28 18:56:38.056 INFO SQL code check has passed
job is scheduled.
2021-12-28 18:56:38.056 INFO Your job is Scheduled. Please wait it to run.
Your job is being scheduled by orchestrator.
Job with jobId : LINKISCLI_codeweaver_spark_0 and execID : LINKISCLI_codeweaver_spark_0 submitted
2021-12-28 18:56:38.056 INFO You have submitted a new job, script code (after variable substitution) is
************************************SCRIPT CODE************************************
SELECT * from mob_bg_devops.servers_exps_weekly_with_wh limit 5000
************************************SCRIPT CODE************************************
2021-12-28 18:56:38.056 INFO Your job is accepted,  jobID is LINKISCLI_codeweaver_spark_0 and taskID is 27 in ServiceInstance(linkis-cg-entrance, bd15-21-32-217:9104). Please wait it to be scheduled
2021-12-28 18:56:38.056 INFO job is running.
2021-12-28 18:56:38.056 INFO Your job is Running now. Please wait it to complete.
Job with jobGroupId : 27 and subJobId : 27 was submitted to Orchestrator.
2021-12-28 18:56:38.056 INFO Background is starting a new engine for you, it may take several seconds, please wait
2021-12-28 18:56:40.056 INFO Retry---success to rebuild task node:astJob_2_codeExec_2, ready to execute new retry-task:astJob_2_retry_0, current age is 1
2021-12-28 18:56:50.056 INFO Background is starting a new engine for you, it may take several seconds, please wait
2021-12-28 18:56:50.056 INFO Retry---success to rebuild task node:astJob_2_retry_0, ready to execute new retry-task:astJob_2_retry_0, current age is 2
2021-12-28 18:57:00.057 INFO Background is starting a new engine for you, it may take several seconds, please wait
2021-12-28 18:57:01.057 INFO Retry---success to rebuild task node:astJob_2_retry_1, ready to execute new retry-task:astJob_2_retry_1, current age is 3
2021-12-28 18:57:11.057 INFO Background is starting a new engine for you, it may take several seconds, please wait
2021-12-28 18:57:12.057 INFO Retry---success to rebuild task node:astJob_2_retry_2, ready to execute new retry-task:astJob_2_retry_2, current age is 4
2021-12-28 18:57:22.057 INFO Background is starting a new engine for you, it may take several seconds, please wait
2021-12-28 18:57:22.057 INFO Retry---success to rebuild task node:astJob_2_retry_3, ready to execute new retry-task:astJob_2_retry_3, current age is 5
2021-12-28 18:57:32.057 INFO Background is starting a new engine for you, it may take several seconds, please wait
2021-12-28 18:57:33.057 INFO Retry---success to rebuild task node:astJob_2_retry_4, ready to execute new retry-task:astJob_2_retry_4, current age is 6
2021-12-28 18:57:43.057 INFO Background is starting a new engine for you, it may take several seconds, please wait
2021-12-28 18:57:44.057 INFO Retry---success to rebuild task node:astJob_2_retry_5, ready to execute new retry-task:astJob_2_retry_5, current age is 7
2021-12-28 18:57:54.057 INFO Background is starting a new engine for you, it may take several seconds, please wait
2021-12-28 18:57:55.057 INFO Retry---success to rebuild task node:astJob_2_retry_6, ready to execute new retry-task:astJob_2_retry_6, current age is 8
2021-12-28 18:58:05.058 INFO Background is starting a new engine for you, it may take several seconds, please wait
2021-12-28 18:58:05.058 INFO Retry---success to rebuild task node:astJob_2_retry_7, ready to execute new retry-task:astJob_2_retry_7, current age is 9
2021-12-28 18:58:15.058 INFO Background is starting a new engine for you, it may take several seconds, please wait
2021-12-28 18:58:16.058 INFO Retry---success to rebuild task node:astJob_2_retry_8, ready to execute new retry-task:astJob_2_retry_8, current age is 10
2021-12-28 18:58:26.058 INFO Background is starting a new engine for you, it may take several seconds, please wait
2021-12-28 18:58:27.058 ERROR Task is Failed,errorMsg: ask Engine failed + errCode: 12003 ,desc: bd15-21-32-217:9101_23 Failed  to async get EngineNodeLinkisRetryException: errCode: 30002 ,desc: 资源不足,请重试: errCode: 11014 ,desc: Queue CPU resources are insufficient, reduce the number of executors.(队列CPU资源不足,建议调小执行器个数) ,ip: bd15-21-32-217 ,port: 9101 ,serviceKind: linkis-cg-linkismanager ,ip: bd15-21-32-217 ,port: 9101 ,serviceKind: linkis-cg-linkismanager ,ip: bd15-21-32-217 ,port: 9104 ,serviceKind: linkis-cg-entrance
2021-12-28 18:58:27.058 INFO job is completed.
2021-12-28 18:58:27.058 INFO Task creation time(任务创建时间): 2021-12-28 18:56:38, Task scheduling time(任务调度时间): 2021-12-28 18:56:38, Task start time(任务开始时间): 2021-12-28 18:56:38, Mission end time(任务结束时间): 2021-12-28 18:58:27
2021-12-28 18:58:27.058 INFO Your mission(您的任务) 27 The total time spent is(总耗时时间为): 1.8 分钟
2021-12-28 18:58:27.058 INFO Sorry. Your job completed with a status Failed. You can view logs for the reason.

[INFO] Job failed! Will not try get execute result.
============Result:================
TaskId:27
ExecId: exec_id018019linkis-cg-entrancebd15-21-32-217:9104LINKISCLI_codeweaver_spark_0
User:codeweaver
Current job status:FAILED
extraMsg:
errCode: 11014
errDesc: 队列CPU资源不足

############Execute Error!!!########

查看日志

f2511155-4f0f-4dcd-818b-df3a0cb3632a:WARNING: User-defined SPARK_HOME (/opt/mobdata/spark/spark-2.4.3.mob1-bin-2.6.5) overrides detected (/opt/cloudera/parcels/CDH-5.7.6-1.cdh5.7.6.p0.6/lib/spark).
f2511155-4f0f-4dcd-818b-df3a0cb3632a:WARNING: Running spark-class from user-defined location.
f2511155-4f0f-4dcd-818b-df3a0cb3632a:Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=250m; support was removed in 8.0
f2511155-4f0f-4dcd-818b-df3a0cb3632a:Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0
f2511155-4f0f-4dcd-818b-df3a0cb3632a:SLF4J: Class path contains multiple SLF4J bindings.
f2511155-4f0f-4dcd-818b-df3a0cb3632a:SLF4J: Found binding in [jar:file:/tmp/codeweaver/linkis_dev/engineConnPublickDir/3e349615-708d-44e0-899b-6b6b590d219e/v000002/lib/log4j-slf4j-impl-2.13.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
f2511155-4f0f-4dcd-818b-df3a0cb3632a:SLF4J: Found binding in [jar:file:/opt/mobdata/spark/spark-2.4.3.mob1-bin-2.6.5/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
f2511155-4f0f-4dcd-818b-df3a0cb3632a:SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
f2511155-4f0f-4dcd-818b-df3a0cb3632a:SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
f2511155-4f0f-4dcd-818b-df3a0cb3632a:Exception in thread "main" java.lang.NoClassDefFoundError: scala/Product$class
f2511155-4f0f-4dcd-818b-df3a0cb3632a:	at com.webank.wedatasphere.linkis.common.conf.CommonVars.<init>(CommonVars.scala:22)
f2511155-4f0f-4dcd-818b-df3a0cb3632a:	at com.webank.wedatasphere.linkis.common.conf.CommonVars$.apply(CommonVars.scala:35)
f2511155-4f0f-4dcd-818b-df3a0cb3632a:	at com.webank.wedatasphere.linkis.common.conf.CommonVars.apply(CommonVars.scala)
f2511155-4f0f-4dcd-818b-df3a0cb3632a:	at com.webank.wedatasphere.linkis.manager.label.conf.LabelCommonConfig.<clinit>(LabelCommonConfig.java:23)
f2511155-4f0f-4dcd-818b-df3a0cb3632a:	at com.webank.wedatasphere.linkis.manager.label.builder.factory.LabelBuilderFactoryContext.getLabelBuilderFactory(LabelBuilderFactoryContext.java:45)
f2511155-4f0f-4dcd-818b-df3a0cb3632a:	at com.webank.wedatasphere.linkis.engineconn.launch.EngineConnServer$.<init>(EngineConnServer.scala:30)
f2511155-4f0f-4dcd-818b-df3a0cb3632a:	at com.webank.wedatasphere.linkis.engineconn.launch.EngineConnServer$.<clinit>(EngineConnServer.scala)
f2511155-4f0f-4dcd-818b-df3a0cb3632a:	at com.webank.wedatasphere.linkis.engineconn.launch.EngineConnServer.main(EngineConnServer.scala)
f2511155-4f0f-4dcd-818b-df3a0cb3632a:	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
f2511155-4f0f-4dcd-818b-df3a0cb3632a:	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
f2511155-4f0f-4dcd-818b-df3a0cb3632a:	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
f2511155-4f0f-4dcd-818b-df3a0cb3632a:	at java.lang.reflect.Method.invoke(Method.java:497)
f2511155-4f0f-4dcd-818b-df3a0cb3632a:	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
f2511155-4f0f-4dcd-818b-df3a0cb3632a:	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)
f2511155-4f0f-4dcd-818b-df3a0cb3632a:	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
f2511155-4f0f-4dcd-818b-df3a0cb3632a:	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
f2511155-4f0f-4dcd-818b-df3a0cb3632a:	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
f2511155-4f0f-4dcd-818b-df3a0cb3632a:	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
f2511155-4f0f-4dcd-818b-df3a0cb3632a:	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
f2511155-4f0f-4dcd-818b-df3a0cb3632a:	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
f2511155-4f0f-4dcd-818b-df3a0cb3632a:Caused by: java.lang.ClassNotFoundException: scala.Product$class
f2511155-4f0f-4dcd-818b-df3a0cb3632a:	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
f2511155-4f0f-4dcd-818b-df3a0cb3632a:	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
f2511155-4f0f-4dcd-818b-df3a0cb3632a:	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
f2511155-4f0f-4dcd-818b-df3a0cb3632a:	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
f2511155-4f0f-4dcd-818b-df3a0cb3632a:	... 20 more

这。。。莫非是scala的版本不对应? 然后我检查了一下spark的scala版本和linkis的scala版本,发现。。。我们的spark2.4.3竟然是2.12的scala,而linkis是2.11版本

好吧。。只能重新编译了

下一位

shell引擎

竟然意想不到的顺利

[codeweaver@bd15-21-32-217 bin]$ ./linkis-cli -engineType shell-1 -codeType shell -code "echo 123;"  -submitUser codeweaver -proxyUser codeweaver
[INFO] LogFile path: /home/codeweaver/linkis/logs/linkis-cli/linkis-client.codeweaver.log.20211228165047156877579
[INFO] User does not provide usr-configuration file. Will use default config
[INFO] connecting to linkis gateway:http://127.0.0.1:9001
JobId:15
TaskId:15
ExecId:exec_id018019linkis-cg-entrancebd15-21-32-217:9104LINKISCLI_codeweaver_shell_0
[INFO] Job is successfully submitted!

2021-12-28 16:50:48.050 INFO Program is substituting variables for you
2021-12-28 16:50:48.050 INFO Variables substitution ended successfully
job is scheduled.
2021-12-28 16:50:49.050 INFO Your job is Scheduled. Please wait it to run.
Your job is being scheduled by orchestrator.
Job with jobId : LINKISCLI_codeweaver_shell_0 and execID : LINKISCLI_codeweaver_shell_0 submitted
2021-12-28 16:50:49.050 INFO You have submitted a new job, script code (after variable substitution) is
************************************SCRIPT CODE************************************
echo 123
************************************SCRIPT CODE************************************
2021-12-28 16:50:49.050 INFO Your job is accepted,  jobID is LINKISCLI_codeweaver_shell_0 and taskID is 15 in ServiceInstance(linkis-cg-entrance, bd15-21-32-217:9104). Please wait it to be scheduled
2021-12-28 16:50:49.050 INFO job is running.
2021-12-28 16:50:49.050 INFO Your job is Running now. Please wait it to complete.
Job with jobGroupId : 15 and subJobId : 15 was submitted to Orchestrator.
2021-12-28 16:50:49.050 INFO Background is starting a new engine for you, it may take several seconds, please wait
2021-12-28 16:51:08.051 INFO EngineConn local log path: ServiceInstance(linkis-cg-engineconn, bd15-21-32-217:33772) /tmp/codeweaver/linkis_dev/codeweaver/workDir/98dab8a4-389c-43d4-82bf-aa1d1b3eb595/logs
bd15-21-32-217:33772_0 >> echo 123
Your subjob : 15 execue with state succeed, has 1 resultsets.
Congratuaions! Your job : LINKISCLI_codeweaver_shell_0 executed with status succeed and 0 results.
2021-12-28 16:51:10.051 INFO job is completed.
2021-12-28 16:51:10.051 INFO Task creation time(任务创建时间): 2021-12-28 16:50:48, Task scheduling time(任务调度时间): 2021-12-28 16:50:49, Task start time(任务开始时间): 2021-12-28 16:50:49, Mission end time(任务结束时间): 2021-12-28 16:51:10
2021-12-28 16:51:10.051 INFO Your mission(您的任务) 15 The total time spent is(总耗时时间为): 21.7 秒
2021-12-28 16:51:10.051 INFO Congratulations. Your job completed with status Success.

[INFO] Job execute successfully! Will try get execute result
============Result:================
TaskId:15
ExecId: exec_id018019linkis-cg-entrancebd15-21-32-217:9104LINKISCLI_codeweaver_shell_0
User:codeweaver
Current job status:SUCCEED
extraMsg:
result:


============RESULT SET============
123
============END OF RESULT SET============

############Execute Success!!!########

四、阶段性总结

今天三个引擎成功调试完成了shell,hive和spark还有不少坑要踩。。。

下期再战!