1.部署准备
由于kettle是基于java写的,所以需要jdk环境
vi /etc/profile
export JAVA_HOME=/usr/java/jre1.8.0_45
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$JAVA_HOME/bin:$PATH
source /etc/profile ## 生效
2.kettle部署
在国内镜像、官网镜像上下载安装包,放到指定Linux指定目录
unzip pdi-ce-7.1.0.0-12.zip ## 解压
cd data-integration ##kettle根目录
chmod +x *.sh ##修改脚本权限
./kitchen.sh ##判断是否成功
判断是否成功时出现警告
#######################################################################
WARNING: no libwebkitgtk-1.0 detected, some features will be unavailable
Consider installing the package with apt-get or yum.
e.g. 'sudo apt-get install libwebkitgtk-1.0-0'
#######################################################################
OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=256m; support was removed in 8.0
Options:
-rep = Repository name
-user = Repository username
-pass = Repository password
-job = The name of the job to launch
-dir = The directory (dont forget the leading /)
-file = The filename (Job XML) to launch
-level = The logging level (Basic, Detailed, Debug, Rowlevel, Error, Minimal, Nothing)
-logfile = The logging file to write to
-listdir = List the directories in the repository
-listjobs = List the jobs in the specified directory
-listrep = List the available repositories
-norep = Do not log into the repository
-version = show the version, revision and build date
-param = Set a named parameter <NAME>=<VALUE>. For example -param:FILE=customers.csv
-listparam = List information concerning the defined parameters in the specified job.
-export = Exports all linked resources of the specified job. The argument is the name of a ZIP file.
-custom = Set a custom plugin specific option as a String value in the job using <NAME>=<Value>, for example: -custom:COLOR=Red
-maxloglines = The maximum number of log lines that are kept internally by Kettle. Set to 0 to keep all rows (default)
-maxlogtimeout = The maximum age (in minutes) of a log line while being kept internally by Kettle. Set to 0 to keep all rows indefinitely (default)
查看官网说明需要libwebkitgtk环境,安装即可
sudo apt-get install libwebkitgtk-1.0.0
./kitchen.sh ##判断kettle是否安装成功
OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=256m; support was removed in 8.0
Options:
-rep = Repository name
-user = Repository username
-pass = Repository password
-job = The name of the job to launch
-dir = The directory (dont forget the leading /)
-file = The filename (Job XML) to launch
-level = The logging level (Basic, Detailed, Debug, Rowlevel, Error, Minimal, Nothing)
-logfile = The logging file to write to
-listdir = List the directories in the repository
-listjobs = List the jobs in the specified directory
-listrep = List the available repositories
-norep = Do not log into the repository
-version = show the version, revision and build date
-param = Set a named parameter <NAME>=<VALUE>. For example -param:FILE=customers.csv
-listparam = List information concerning the defined parameters in the specified job.
-export = Exports all linked resources of the specified job. The argument is the name of a ZIP file.
-custom = Set a custom plugin specific option as a String value in the job using <NAME>=<Value>, for example: -custom:COLOR=Red
-maxloglines = The maximum number of log lines that are kept internally by Kettle. Set to 0 to keep all rows (default)
-maxlogtimeout = The maximum age (in minutes) of a log line while being kept internally by Kettle. Set to 0 to keep all rows indefinitely (default)
3.搭建脚本目录
mkdir -p /data/kettle/kettle_file/job ##存放作业文件
mkdir /data/kettle/kettle_file/transition ##存放转换
mkdir /data/kettle/kettle_sh ##存放执行脚本
mkdir /data/kettle/kettle_log ##存放执行kettle产生的日志文件
将从windows上配置好的.ktr和.kjb程序分别对应放在transition目录和job目录下
注意: windows下的.kjb文件里的路径和Linux的不一样,要修改后再复制到Linux下运行,不然提示找不到文件,文件方式打开,修改fileName里的路径
- 测试转换
/home/kettle/data-integration/pan.sh -file=/home/kettle/workfile/kettleFile/test.ktr
- 测试job
/home/kettle/data-integration/kitchen.sh -file=/home/kettle/workfile/kettleFile/testJob.kjb
4.设置定时任务
- 编写任务脚本
注意:crontab只加载/ect/environment,并不加载/etc/profile和~/.bash_profile,所以需要在脚本里手动设置环境变量
#!/bin/bash
cd /home/user/hzx/kettle/data-integration
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$JAVA_HOME/bin:$PATH
./kitchen.sh -file=/home/kettle/workfile/kettleFile/testJob.kjb --level=Basic >> /home/kettle/workfile/kettleLogs/testJob__$(date +%Y%m%d%H%M%S).log
- 添加定时任务
user@user:~/hzx/kettle/workfile/kettleShs$ crontab -e
no crontab for user - using an empty one
Select an editor. To change later, run 'select-editor'.
1. /bin/nano <---- easiest
2. /usr/bin/vim.basic
3. /usr/bin/vim.tiny
4. /bin/ed
Choose 1-4 [1]: 2 ## 我选择2 回车编辑任务列表
# Edit this file to introduce tasks to be run by cron.
#
# Each task to run has to be defined through a single line
# indicating with different fields when the task will be run
# and what command to run for the task
#
# To define the time you can provide concrete values for
# minute (m), hour (h), day of month (dom), month (mon),
# and day of week (dow) or use '*' in these fields (for 'any').#
# Notice that tasks will be started based on the cron's system
# daemon's notion of time and timezones.
#
# Output of the crontab jobs (including errors) is sent through
# email to the user the crontab file belongs to (unless redirected).
#
# For example, you can run a backup of all your user accounts
# at 5 a.m every week with:
# 0 5 * * 1 tar -zcf /var/backups/home.tgz /home/
#
# For more information see the manual pages of crontab(5) and cron(8)
#
# m h dom mon dow command
0 */1 * * * /home/user/hzx/kettle/workfile/kettleShs/job.sh ##每隔1小时执行,自行选择
保存,完结散花
注意:当服务器重启时,crontab里的任务是不会补偿停机过程缺少的执行次数,所以要注意任务里的时间参数与服务器时间是否有强依赖关系 参看链接