参考文档
- Deployment of Apache SeaTunnel Web
- SeaTunnel deployment
- [Bug][Seatunnel-web] 已经配置了数据源,无法选择source name
SeaTunnel部署
我这里是在emr core上部署了3节点的seatunnel-cluster
, seatunnel-web
部署在集群之外的单节点。
IP | seatunnel-engine | seatunnel-web | 说明 |
---|---|---|---|
10.6.4.24 | master 免密登陆其他节点 | ||
10.6.4.10 | ✅ | core | |
10.6.4.14 | ✅ | core | |
10.6.4.15 | ✅ | core | |
10.6.6.2 | ✅ | ✅ | data-collect 这里的seatunnel只是client,不启动服务 |
Linux环境初始化
-
后续会用到scp之类的操作,可根据自己情况跳过这个环节,首先在
master
创建hadoop
用户并授权sudo
全线sudo su - useradd -m -g hadoop hadoop echo "hadoop ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers
-
hadoop
用户生成密钥对# 执行命令后一路回车 su - hadoop ssh-keygen -t rsa
-
其他需要免密登陆的节点创建hadoop用户并且同步公钥信息
# xxxxxxxxxxxxx 换成上一步中生成的公钥,cat ~/.ssh/id_rsa.pub 可以获取 su - hadoop mkdir ~/.ssh echo "xxxxxxxxxxxxx" >> ~/.ssh/authorized_keys chmod 600 ~/.ssh/authorized_keys exit exit exit
-
验证免密是否成功:
ssh -p 1022 hadoop@10.6.4.10
SeaTunnel部署
-
master
节点下载seatunnel
到/opt/softs
目录下,下载链接:download, 这里建议下载2.3.3
版本的,我最开始用过高版本的部署存在一些问题,不确定是版本的问题还是我设置的一些问题。seatunnel-web
有段时间没有更新了,不清楚高版本有没有兼容问题。sudo su - hadoop mkdir /opt/softs mkdir /opt/service cd /opt/softs export version="2.3.3" wget "https://archive.apache.org/dist/seatunnel/${version}/apache-seatunnel-${version}-bin.tar.gz" tar -zxvf apache-seatunnel-2.3.3-bin.tar.gz -C /opt/service
-
安装plugin时用maven下载jar,默认的脚本里面会下载个
mvnw
,可以自己安装个maven,也可以直接用他脚本的,不过建议换一下maven源,不然下载会很慢,修改maven源为阿里的<mirror> <id>alimaven</id> <name>aliyun maven</name> <url>http://maven.aliyun.com/nexus/content/groups/public/</url> <mirrorOf>central</mirrorOf>
编辑
config/plugin_config
,保留自己需要的connector即可,这是我的# Don't modify the delimiter " -- ", just select the plugin you need --connectors-v2-- connector-cdc-mysql connector-clickhouse connector-dingtalk connector-doris connector-elasticsearch connector-file-s3 connector-hive connector-hudi connector-jdbc connector-kafka connector-kudu connector-redis connector-starrocks --end--
安装
connector plugin
,执行完成后会在connectors/seatunnel
看到对应jar包mkdir ~/.m2 cp setting.xml ~/.m2/ sh bin/install-plugin.sh
可以手动去maven上一个个下载对应jar然后上传到对应目录(不建议),官网里面给个tips说需要在
connectors
目录下创建flink
,flink-sql
等子目录,实测不用。 -
配置环境变量
/etc/profile
中增加一下内容:export SEATUNNEL_HOME=/opt/service/apache-seatunnel-2.3.3 export PATH=$PATH:SSEATUNNEL_HOME/bin # source /etc/profile 生效
-
修改启动脚本
$SEATUNNEL_HOME/bin/seatunnel-cluster.sh
,首行增加java内存配置JAVA_OPTS="-Xms2G -Xmx2G"
-
修改
config
目录相关配置,具体配置作用可以参考官网,我这里不一一列举了,直接罗列出我的配置-
seatunnel.yaml
ha的hdfs参考checkpoint storageseatunnel: engine: history-job-expire-minutes: 4320 backup-count: 1 queue-type: blockingqueue print-execution-info-interval: 60 print-job-metrics-info-interval: 60 slot-service: dynamic-slot: true checkpoint: interval: 10000 timeout: 60000 storage: type: hdfs max-retained: 3 plugin-config: namespace: /tmp/seatunnel/checkpoint_snapshot storage.type: hdfs fs.defaultFS: hdfs://emr-cluster seatunnel.hadoop.dfs.nameservices: emr-cluster seatunnel.hadoop.dfs.ha.namenodes.emr-cluster: nn1,nn2 seatunnel.hadoop.dfs.namenode.rpc-address.emr-cluster.nn1: 10.6.4.13:8020 seatunnel.hadoop.dfs.namenode.rpc-address.emr-cluster.nn2: 10.6.4.24:8020 seatunnel.hadoop.dfs.client.failover.proxy.provider.emr-cluster: org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
-
hazelcast.yaml
主要配置cluster-name
和member-list
hazelcast: cluster-name: seatunnel-test network: rest-api: enabled: true endpoint-groups: CLUSTER_WRITE: enabled: true DATA: enabled: true join: tcp-ip: enabled: true member-list: - 10.6.4.10 - 10.6.4.14 - 10.6.4.15 port: auto-increment: false port: 5801 properties: hazelcast.invocation.max.retry.count: 20 hazelcast.tcp.join.port.try.count: 30 hazelcast.logging.type: log4j2 hazelcast.operation.generic.thread.count: 50
-
hazelcast-client.yaml
,注意cluster-name
要和hazelcast.yaml
中的一致,不然提交不了任务hazelcast-client: cluster-name: seatunnel-test properties: hazelcast.logging.type: log4j2 network: cluster-members: - 10.6.4.10:5801 - 10.6.4.14:5801 - 10.6.4.15:5801
-
创建
logs
目录mkdir -p $SEATUNNEL_HOME/logs
-
将部署包分发到其他节点并启动
ansible emr_core -m shell -a "sudo mkdir -p /opt/service" ansible emr_core -m shell -a "sudo chown -R hadoop:hadoop /opt/service" ansible emr_core -m copy -a "src=/opt/service/apache-seatunnel-2.3.3 dest=/opt/service/" ansible emr_core -m shell -a "sh /opt/service/apache-seatunnel-2.3.3/bin/seatunnel-cluster.sh -d"
-
去core节点确定下任务是否正常运行,运行不报错即可
cd /opt/service/apache-seatunnel-2.3.3 sh bin/seatunnel.sh --config config/v2.batch.config.template
-
SeaTunnel Web部署
-
部署
SeaTunnel Engine Client
,直接将master
节点中的包复制一份过来即可# master节点 cd /opt/service scp -P 1022 -r apache-seatunnel-2.3.3 hadoop@10.6.6.2:$PWD # data-collect节点 cd /opt/service/apache-seatunnel-2.3.3 sh bin/seatunnel.sh --config config/v2.batch.config.template
-
下载
`apache-seatunnel-web-1.0.0-bin.tar.gz
到/opt/softs
目录并解压mkdir /opt/softs cd /opt/softs wget https://www.apache.org/dyn/closer.lua/seatunnel/seatunnel-web/1.0.0/apache-seatunnel-web-1.0.0-bin.tar.gz tar -zxvf apache-seatunnel-web-1.0.0-bin.tar.gz -C /opt/service
-
配置环境变量,
/etc/profile
中添加以下内容,source /etc/profile
生效:export SEATUNNEL_HOME=/opt/service/apache-seatunnel-2.3.3 export SEATUNNEL_WEB_HOME=/opt/service/apache-seatunnel-web-1.0.0-bin export ST_WEB_BASEDIR_PATH=/opt/service/apache-seatunnel-web-1.0.0-bin/ui export PATH=$PATH:$SEATUNNEL_HOME/bin:$SEATUNNEL_WEB_HOME/bin
-
初始化数据库:
-
修改
apache-seatunnel-web-1.0.0-bin/script/seatunnel_server_env.sh
将其修改成正确的连接,需要注意的是默认用的是seatunnel
的库export HOSTNAME="localhost" export PORT="3306" export USERNAME="root" export PASSWORD="123456"
-
执行
sh apache-seatunnel-web-1.0.0-bin/script/init_sql.sh
初始化数据,执行没有异常即成功
-
-
下载
DataSorce Plugin
cd /opt/service/apache-seatunnel-web-1.0.0-bin/bin wget https://seatunnel.apache.org/assets/files/download_datasource-4b79e6fafe80459590a6a0fc2865e5ac.sh mv download_datasource-4b79e6fafe80459590a6a0fc2865e5ac.sh download_datasource.sh # 建议在执行之前修改这个脚本删除里面的datasource-hive配置,这个会存在跟seatunnel-web中自带的jar的jetty-server依赖版本不一致问题,导致启动很失败 sh download_datasource.sh
-
依赖和配置补齐
- 需要手动将
mysql-jdbc
驱动下载到/opt/service/apache-seatunnel-web-1.0.0-bin/libs
/opt/service/apache-seatunnel-web-1.0.0-bin/libs
中datasource-*
相关jar在client节点需要复制到/opt/service/apache-seatunnel-2.3.3/lib/
/opt/service/apache-seatunnel-2.3.3/config/hazelcast-client.yaml
和/opt/service/apache-seatunnel-2.3.3/connectors/plugin-mapping.properties
需要复制到/opt/service/apache-seatunnel-web-1.0.0-bin/conf
中
- 需要手动将
-
修改
seatunnel-web
配置/opt/service/apache-seatunnel-web-1.0.0-bin/conf/application.yml
,mysql链接和初始化是保持一致即可server: port: 8801 spring: application: name: seatunnel jackson: date-format: yyyy-MM-dd HH:mm:ss datasource: driver-class-name: com.mysql.jdbc.Driver url: jdbc:mysql://xxxx:3306/seatunnel?useSSL=false&useUnicode=true&characterEncoding=utf-8&allowMultiQueries=true&allowPublicKeyRetrieval=true username: xxx password: xxx mvc: pathmatch: matching-strategy: ant_path_matcher jwt: expireTime: 86400 secretKey: https://github.com/apache/seatunnel algorithm: HS256
-
启动
SeaTunnel Web
服务-
启动服务,一定要在这个目录
apache-seatunnel-web-1.0.0-bin
中执行启动命令,不然可能找不到前端资源导致访问报错cd /opt/service/apache-seatunnel-web-1.0.0-bin sh bin/seatunnel-backend-daemon.sh start
-
访问http://127.0.0.1:8801/ui,默认用户名密码为
admin:admin
-
问题总结
jetty-server 类版本冲突
SeaTunnel Web
启动时报错:
An attempt was made to call a method that does not exist. The attempt was made from the following location:
org.springframework.boot.web.embedded.jetty.JettyServletWebServerFactory.configureSession(JettyServletWebServerFactory.java:267)
The following method did not exist:
org.eclipse.jetty.server.session.SessionHandler.setMaxInactiveInterval(I)V
The calling method's class, org.springframework.boot.web.embedded.jetty.JettyServletWebServerFactory, was loaded from the following location:
jar:file:/datadir/seatunnel-web/libs/spring-boot-2.6.8.jar!/org/springframework/boot/web/embedded/jetty/JettyServletWebServerFactory.class
The called method's class, org.eclipse.jetty.server.session.SessionHandler, is available from the following locations:
jar:file:/opt/service/apache-seatunnel-web-1.0.0-bin/libs/datasource-hive-1.0.0.jar!/org/eclipse/jetty/server/session/SessionHandler.class
jar:file:/opt/service/apache-seatunnel-web-1.0.0-bin/libs/jetty-server-9.4.53.v20231009.jar!/org/eclipse/jetty/server/session/SessionHandler.class
这是因为/opt/service/apache-seatunnel-web-1.0.0-bin/libs/datasource-hive-1.0.0.jar
和自带jar中jetty-server-9.4.53.v20231009.jar
冲突,删除datasource-hive-1.0.0.jar
可以解决,不过就用不了hive,需要hive的要另寻解决方案。
无法创建数据源
无法创建数据源的要确认下$SEATUNNEL_WEB_HOME/libs
目录下是否有成功下载 datasource-*
相关jar包
可以创建数据源但是无法选择source
我这边已经配置成功就没有那些不成功的截图了,可以参考一下几点逐步确认:
-
确认下
$SEATUNNEL_HOME/lib
目录下有没有datasource-*
相关jar包,这个我不确定是只需要在client节点的有就好了,还是seatunnel-engine
节点也需要,我这里是所有节点都弄了 -
确认下
$SEATUNNEL_HOME/connecotors/seatunnel
目录下有没有相关connector,手动下载的注意要放到connectors/seatunnel
目录下 -
确定下有没有正确配置一下环境变量并且生效
export SEATUNNEL_HOME=/opt/service/apache-seatunnel-2.3.3 export SEATUNNEL_WEB_HOME=/opt/service/apache-seatunnel-web-1.0.0-bin export ST_WEB_BASEDIR_PATH=/opt/service/apache-seatunnel-web-1.0.0-bin/ui export PATH=$PATH:$SEATUNNEL_HOME/bin:$SEATUNNEL_WEB_HOME/bin
如果担心不生效可以直接将这个配置加在
$SEATUNNEL_WEB_HOME/bin/seatunnel-backend-daemon.sh
行首,生效后日志会有一下信息:[AbstractPluginDiscovery.<init>():113] - Load SeaTunnelSink Plugin from /opt/service/apache-seatunnel-2.3.3/connectors/seatunnel [AbstractDataSourceClient.getCustomClassloader():225] - ST_WEB_BASEDIR_PATH is : /opt/service/apache-seatunnel-web-1.0.0-bin/ui
-
确定下
$SEATUNNEL_HOME/lib
下有没有mysql-jdbc
相关去驱动包,这个我不确定是否有影响,看到网上有说,我也有加这个,最终成功不知道与这个是否有关没有验证,如果前面几步确认了还不行可以试一下这个。
整库同步不可用
可以正常创建数据源,单表同步任务也可以选择source时,多表同步、整库同步不行,这个是因为没有配置cdc相关数据源
,多表同步依赖这个配置。
最后送上我配置成功的截图:
其他
SeaTunnel Web
这个项目从2023.10
发布1.0.0
版本之后基本就没怎么看到有代码更新了,官方文档也比较糟糕,很多内容都是空的,github上连issues都没开,后续情况还是比较担忧的,作为apache的顶级项目有点意外。看用户群回复感觉短期也不会有新版本,生产环境使用还是慎重考虑,也不想总是换采集组件。
后续
使用seatunnel-web一段时间后发现真难用,token一过期退出登录都不行,只能清浏览器缓存,然后hadoop相关的东西如果seatunnel-web部署在非emr节点,各种缺包。
发现DolphinScheduler上有集成seatunnel的使用,安装好seatunnel后在DolphinScheduler上配置下环境变量就好了。决定放弃seatunnel-web!!!