本文已参与「开源摘星计划」,欢迎正在阅读的你加入。活动链接:github.com/weopenproje…
背景
在使用 Kylin 作为 OLAP 查询引擎时,我们需要让 kylin 自动进行 cube 构建,当有增量数据产生时,由调度系统自动触发 kylin 的任务构建,这样使得数仓数据计算和 kylin cube 构建可以串起来。
接口说明
触发 kylin 调度任务的方式为调用 kylin 接口,然后触发任务构建。 我们可以通过 java、python 代码的方式调用 kylin 的接口,也可以直接使用 shell 调用 kylin 的接口。
这里主要需要用到3个接口。
用户认证接口,用来验证 kylin 的用户认证。
cube构建接口,用来触发 kylin cube 的构建任务。
根据jobId查询job状态接口,用来监控 cube 构建任务的执行情况。
脚本实现
下面是我编写的 kylin cube 构建脚本。
#!/bin/bash
##******************************************************************************
## ** 功能描述: kylin cube 构建
## **
## ** 执行参数:2-4个 1.(必填) 第一个是构建日期.支持日期或者月份或者字符串null,日期构建日类型的cube,月份构建月类型的cube,null构建没有日期分区的cube
## ** 一次只能构建日模型的一天或月模型的一个月,或者没有日期分区的cube。 例:20200101 或者 202001 或者字符串 null
## ** 2.(必填) 第二个参数是cube名称。
## ** 3.(可选) hive表名,用来统计要构建的数据量。
## ** 4.(可选) hive表的日期分区字段名称,用来统计要构建的数据量时使用,全量构建的cube或者是hive表名为空的情况下不需要此参数。默认为dt。
## *****************************************************************************
source /etc/profile
source ~/.bashrc
username="这里填kylin系统的用户名"
password="这里填kylin系统的密码"
cubeName=$2
echo "【cube name is $2】"
dateParam=$(echo $1 | tr '[A-Z]' '[a-z]')
startTime="123"
endTime="123"
if [[ ${#dateParam} -eq 6 ]]; then
endTime=${dateParam}"01"
startTime=$(date -d "$endTime-1 days" +%Y%m%d)
echo "【build month cube】"
elif [[ ${#dateParam} -eq 8 ]]; then
startTime=${dateParam}
endTime=$(date -d "$startTime+1 days" +%Y%m%d)
echo "【build day cube】"
elif [[ "$dateParam" == "null" ]]; then
startTime=
endTime=
echo "【build whole quantity cube】"
else
echo "【dateParam input error】"
exit 1
fi
tableName=$3
dateField=$4
countSql=''
if [[ ${#tableName} -ge 1 ]]; then
echo "【hive table name is ${tableName}】"
if [[ ${#dateField} -ge 1 ]]; then
echo "【date field is ${dateField}】"
else
echo "【date field is null,use 'dt'】"
dateField='dt'
fi
if [[ "$dateParam" == "null" ]]; then
countSql="set hive.cli.print.header=false; select count(1) from ${tableName}"
else
countSql="set hive.cli.print.header=false; select count(1) from ${tableName} where ${dateField} >='${startTime}' and ${dateField} < '${endTime}'"
fi
else
echo "【hive table name is null】"
fi
if [[ ${#countSql} -ge 5 ]]; then
echo "================================================================="
echo "【count sql is: ${countSql} 】"
rowNum=`hive -e " ${countSql} "`
echo "================================================================="
echo "【the amount of data to build is ${rowNum}】"
if [[ ${rowNum} -le 0 ]]; then
echo "================================================================="
echo "【don't need to build】"
exit 0
fi
fi
now=$(date "+%Y-%m-%d %H:%M:%S")
auth $username $password
jobid=$(build $cubeName $startTime $endTime)
echo "================================================================="
echo "【$now build $startTime - $endTime jobId is $jobid】"
echo "================================================================="
if [[ ${#jobid} -lt 5 ]]; then
echo "【jobid is not as expected,maybe the segment has been merged,the merged segment needs to be built manually】"
exit 1
fi
FLAG=0
while (("$FLAG" != "-1")); do
jobInfo=$(getJobInfo $jobid)
echo "================================================================="
now_time=$(date "+%Y-%m-%d %H:%M:%S")
if [[ $jobInfo =~ "FINISHED" ]]; then
echo "$now_time 【build successed】"
exit 0
elif [[ $jobInfo =~ "ERROR" ]]; then
echo "$now_time 【build failed】"
exit 1
elif [[ $jobInfo =~ "STOPPED" ]]; then
echo "$now_time 【build stopped,it may be manually operated】"
exit 1
elif [[ $jobInfo =~ "DISCARDED" ]]; then
echo "$now_time 【build discarded,it may be manually operated】"
exit 1
elif [[ $jobInfo =~ "PENDING" ]]; then
echo "$now_time 【job status is PENDING, please wait while building...】"
sleep 60
elif [[ $jobInfo =~ "RUNNING" ]]; then
echo "$now_time 【job status is RUNNING, please wait while building...】"
sleep 60
else
if [[ "$FLAG" -lt "10" ]]; then
echo "$now_time 【failed to get job status, the program exits automatically after 10 failed fetches】"
let "FLAG=FLAG+1"
sleep 60
else
echo "$now_time 【a total of 10 failed to obtain the status of the job, the program quit】"
exit 1
fi
fi
echo "================================================================="
done
# 验证账号、密码
function auth() {
username=$1
password=$2
base64Encryption=$(printf "%s""$username:$password" | base64)
authentication=$(curl -X POST -H "Authorization: Basic $base64Encryption" -H "Content-Type: application/json;charset=UTF-8" http://your-kylin-host:7070/kylin/api/user/authentication)
if [[ $authentication =~ "Unauthorized" ]]; then
echo "Authentication failure: user name or password wrong"
exit 1
fi
}
# 构建cube
function build() {
cubeName=$1
startTime=$2
endTime=$3
startTimeTimestamp=$(date -d "$startTime 00:00:00" +%s)
endTimeTimestamp=$(date -d "$endTime 00:00:00" +%s)
GMT8=$((8 * 60 * 60 * 1000))
kylinStartTime=$((startTimeTimestamp * 1000 + GMT8))
kylinEndTime=$((endTimeTimestamp * 1000 + GMT8))
buildInfo=$(curl -X PUT -H "Authorization: Basic $base64Encryption" -H "Content-Type: application/json;charset=UTF-8" -d '{"startTime":'$kylinStartTime', "endTime":'$kylinEndTime', "buildType":"BUILD"}' http://your-kylin-host:7070/kylin/api/cubes/${cubeName}/rebuild)
uuid=$(echo $buildInfo | grep -oP '(?<={"uuid":").*(?=","last_modified")')
echo $uuid
}
# 根据jobId查询job状态
function getJobInfo() {
uuid=$1
jobInfo=$(curl -X GET -H "Authorization: Basic $base64Encryption" -H "Content-Type: application/json;charset=UTF-8" http://your-kylin-host:7070/kylin/api/jobs/$uuid)
jobStatus=$(echo $jobInfo | grep -oP '(?<="job_status":").*(?=","progress")')
progress=$(echo $jobInfo | grep -oP '(?<="progress":).*(?=})')
echo "jobStatus:$jobStatus progress:$progress"
}
使用方式
- 新建 shell 脚本文件,命名
kylin_build.sh。
kylin_build只是示例名称,可以自由命名。
- 复制上面的脚本内容到
kylin_build.sh文件,修改脚本中的用户名、密码、your-kylin-host,然后保存。 - 执行脚本,触发构建任务。
执行命令
sh kylin_build.sh 日期参数(必填) cube名称(必填) hive表名(可选) hive表日期分区字段名称(可选)
- 等待任务构建完成。
扩展提示
可以使用Apache DolphinScheduler调度引擎(不限于Apache DolphinScheduler)来调度哦~