SeaTunnel & SeaTunnel Web部署SeaTunnel & SeaTunnel Web部署我这边已经

参考文档

SeaTunnel部署

我这里是在emr core上部署了3节点的seatunnel-cluster ， seatunnel-web部署在集群之外的单节点。

IP	seatunnel-engine	seatunnel-web	说明
10.6.4.24			master 免密登陆其他节点
10.6.4.10	✅		core
10.6.4.14	✅		core
10.6.4.15	✅		core
10.6.6.2	✅	✅	data-collect 这里的seatunnel只是client，不启动服务

Linux环境初始化

后续会用到scp之类的操作，可根据自己情况跳过这个环节，首先在master创建hadoop用户并授权sudo 全线
```
sudo su - 
useradd -m -g hadoop hadoop

echo "hadoop ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers
```

hadoop用户生成密钥对

# 执行命令后一路回车
su - hadoop
ssh-keygen -t rsa

其他需要免密登陆的节点创建hadoop用户并且同步公钥信息

# xxxxxxxxxxxxx 换成上一步中生成的公钥，cat ~/.ssh/id_rsa.pub 可以获取
su - hadoop 
mkdir ~/.ssh
echo "xxxxxxxxxxxxx" >> ~/.ssh/authorized_keys
chmod 600 ~/.ssh/authorized_keys
exit
exit
exit

验证免密是否成功：
```
ssh -p 1022 hadoop@10.6.4.10
```

SeaTunnel部署

master节点下载seatunnel 到/opt/softs目录下，下载链接：download，这里建议下载2.3.3版本的，我最开始用过高版本的部署存在一些问题，不确定是版本的问题还是我设置的一些问题。seatunnel-web有段时间没有更新了，不清楚高版本有没有兼容问题。
```
sudo su - hadoop
mkdir /opt/softs
mkdir /opt/service

cd /opt/softs
export version="2.3.3"
wget "https://archive.apache.org/dist/seatunnel/${version}/apache-seatunnel-${version}-bin.tar.gz"

tar -zxvf apache-seatunnel-2.3.3-bin.tar.gz -C /opt/service
```

安装plugin时用maven下载jar，默认的脚本里面会下载个mvnw，可以自己安装个maven，也可以直接用他脚本的，不过建议换一下maven源，不然下载会很慢，修改maven源为阿里的

  <mirror>
   <id>alimaven</id>
     <name>aliyun maven</name>
     <url>http://maven.aliyun.com/nexus/content/groups/public/</url>
  <mirrorOf>central</mirrorOf>

编辑config/plugin_config，保留自己需要的connector即可，这是我的

# Don't modify the delimiter " -- ", just select the plugin you need
--connectors-v2--
connector-cdc-mysql
connector-clickhouse
connector-dingtalk
connector-doris
connector-elasticsearch
connector-file-s3
connector-hive
connector-hudi
connector-jdbc
connector-kafka
connector-kudu
connector-redis
connector-starrocks
--end--

安装connector plugin，执行完成后会在connectors/seatunnel看到对应jar包

mkdir ~/.m2
cp setting.xml ~/.m2/

sh bin/install-plugin.sh

可以手动去maven上一个个下载对应jar然后上传到对应目录（不建议），官网里面给个tips说需要在connectors目录下创建flink, flink-sql 等子目录，实测不用。

配置环境变量/etc/profile 中增加一下内容：

export SEATUNNEL_HOME=/opt/service/apache-seatunnel-2.3.3
export PATH=$PATH:SSEATUNNEL_HOME/bin


# source /etc/profile 生效

修改启动脚本$SEATUNNEL_HOME/bin/seatunnel-cluster.sh，首行增加java内存配置JAVA_OPTS="-Xms2G -Xmx2G"

修改config目录相关配置，具体配置作用可以参考官网，我这里不一一列举了，直接罗列出我的配置

seatunnel.yaml ha的hdfs参考checkpoint storage

seatunnel:
  engine:
    history-job-expire-minutes: 4320
    backup-count: 1
    queue-type: blockingqueue
    print-execution-info-interval: 60
    print-job-metrics-info-interval: 60
    slot-service:
      dynamic-slot: true
    checkpoint:
      interval: 10000
      timeout: 60000
      storage:
        type: hdfs
        max-retained: 3
        plugin-config:
          namespace: /tmp/seatunnel/checkpoint_snapshot
          storage.type: hdfs
          fs.defaultFS: hdfs://emr-cluster
          seatunnel.hadoop.dfs.nameservices: emr-cluster
          seatunnel.hadoop.dfs.ha.namenodes.emr-cluster: nn1,nn2
          seatunnel.hadoop.dfs.namenode.rpc-address.emr-cluster.nn1: 10.6.4.13:8020
          seatunnel.hadoop.dfs.namenode.rpc-address.emr-cluster.nn2: 10.6.4.24:8020
          seatunnel.hadoop.dfs.client.failover.proxy.provider.emr-cluster: org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider

hazelcast.yaml 主要配置cluster-name和member-list

hazelcast:
  cluster-name: seatunnel-test
  network:
    rest-api:
      enabled: true
      endpoint-groups:
        CLUSTER_WRITE:
          enabled: true
        DATA:
          enabled: true
    join:
      tcp-ip:
        enabled: true
        member-list:
          - 10.6.4.10
          - 10.6.4.14
          - 10.6.4.15
    port:
      auto-increment: false
      port: 5801
  properties:
    hazelcast.invocation.max.retry.count: 20
    hazelcast.tcp.join.port.try.count: 30
    hazelcast.logging.type: log4j2
    hazelcast.operation.generic.thread.count: 50

hazelcast-client.yaml，注意cluster-name 要和hazelcast.yaml中的一致，不然提交不了任务

hazelcast-client:
  cluster-name: seatunnel-test
  properties:
    hazelcast.logging.type: log4j2
  network:
    cluster-members:
      - 10.6.4.10:5801
      - 10.6.4.14:5801
      - 10.6.4.15:5801

创建logs目录
```
mkdir -p $SEATUNNEL_HOME/logs
```

将部署包分发到其他节点并启动

ansible emr_core -m shell -a "sudo mkdir -p /opt/service"
ansible emr_core -m shell -a "sudo chown -R hadoop:hadoop /opt/service"
ansible emr_core -m copy -a "src=/opt/service/apache-seatunnel-2.3.3 dest=/opt/service/"

ansible emr_core -m shell -a "sh /opt/service/apache-seatunnel-2.3.3/bin/seatunnel-cluster.sh -d"

去core节点确定下任务是否正常运行，运行不报错即可

cd /opt/service/apache-seatunnel-2.3.3
sh bin/seatunnel.sh --config config/v2.batch.config.template

SeaTunnel Web部署

部署SeaTunnel Engine Client，直接将master节点中的包复制一份过来即可

# master节点
cd /opt/service
scp -P 1022 -r apache-seatunnel-2.3.3 hadoop@10.6.6.2:$PWD

# data-collect节点
cd /opt/service/apache-seatunnel-2.3.3
sh bin/seatunnel.sh --config config/v2.batch.config.template

下载`apache-seatunnel-web-1.0.0-bin.tar.gz 到 /opt/softs目录并解压

mkdir /opt/softs
cd /opt/softs
wget https://www.apache.org/dyn/closer.lua/seatunnel/seatunnel-web/1.0.0/apache-seatunnel-web-1.0.0-bin.tar.gz
tar -zxvf apache-seatunnel-web-1.0.0-bin.tar.gz -C /opt/service

配置环境变量，/etc/profile中添加以下内容， source /etc/profile生效：

export SEATUNNEL_HOME=/opt/service/apache-seatunnel-2.3.3
export SEATUNNEL_WEB_HOME=/opt/service/apache-seatunnel-web-1.0.0-bin
export ST_WEB_BASEDIR_PATH=/opt/service/apache-seatunnel-web-1.0.0-bin/ui

export PATH=$PATH:$SEATUNNEL_HOME/bin:$SEATUNNEL_WEB_HOME/bin

初始化数据库：
1. 修改apache-seatunnel-web-1.0.0-bin/script/seatunnel_server_env.sh将其修改成正确的连接，需要注意的是默认用的是seatunnel的库
```
export HOSTNAME="localhost"
export PORT="3306"
export USERNAME="root"
export PASSWORD="123456"
```
2. 执行sh apache-seatunnel-web-1.0.0-bin/script/init_sql.sh初始化数据，执行没有异常即成功

下载DataSorce Plugin

cd /opt/service/apache-seatunnel-web-1.0.0-bin/bin
wget https://seatunnel.apache.org/assets/files/download_datasource-4b79e6fafe80459590a6a0fc2865e5ac.sh
mv download_datasource-4b79e6fafe80459590a6a0fc2865e5ac.sh download_datasource.sh

# 建议在执行之前修改这个脚本删除里面的datasource-hive配置，这个会存在跟seatunnel-web中自带的jar的jetty-server依赖版本不一致问题，导致启动很失败
sh download_datasource.sh

依赖和配置补齐
1. 需要手动将mysql-jdbc驱动下载到/opt/service/apache-seatunnel-web-1.0.0-bin/libs
2. /opt/service/apache-seatunnel-web-1.0.0-bin/libs中datasource-*相关jar在client节点需要复制到/opt/service/apache-seatunnel-2.3.3/lib/
3. /opt/service/apache-seatunnel-2.3.3/config/hazelcast-client.yaml和/opt/service/apache-seatunnel-2.3.3/connectors/plugin-mapping.properties需要复制到/opt/service/apache-seatunnel-web-1.0.0-bin/conf中

修改seatunnel-web配置/opt/service/apache-seatunnel-web-1.0.0-bin/conf/application.yml，mysql链接和初始化是保持一致即可

server:
  port: 8801

spring:
  application:
    name: seatunnel
  jackson:
    date-format: yyyy-MM-dd HH:mm:ss
  datasource:
    driver-class-name: com.mysql.jdbc.Driver
    url: jdbc:mysql://xxxx:3306/seatunnel?useSSL=false&useUnicode=true&characterEncoding=utf-8&allowMultiQueries=true&allowPublicKeyRetrieval=true
    username: xxx
    password: xxx
  mvc:
    pathmatch:
      matching-strategy: ant_path_matcher

jwt:
  expireTime: 86400
  secretKey: https://github.com/apache/seatunnel
  algorithm: HS256

启动SeaTunnel Web服务
1. 启动服务，一定要在这个目录apache-seatunnel-web-1.0.0-bin中执行启动命令，不然可能找不到前端资源导致访问报错
```
cd /opt/service/apache-seatunnel-web-1.0.0-bin
sh bin/seatunnel-backend-daemon.sh start
```
2. 访问http://127.0.0.1:8801/ui，默认用户名密码为admin：admin

问题总结

jetty-server 类版本冲突

SeaTunnel Web启动时报错：

An attempt was made to call a method that does not exist. The attempt was made from the following location:

    org.springframework.boot.web.embedded.jetty.JettyServletWebServerFactory.configureSession(JettyServletWebServerFactory.java:267)

The following method did not exist:

    org.eclipse.jetty.server.session.SessionHandler.setMaxInactiveInterval(I)V

The calling method's class, org.springframework.boot.web.embedded.jetty.JettyServletWebServerFactory, was loaded from the following location:

    jar:file:/datadir/seatunnel-web/libs/spring-boot-2.6.8.jar!/org/springframework/boot/web/embedded/jetty/JettyServletWebServerFactory.class

The called method's class, org.eclipse.jetty.server.session.SessionHandler, is available from the following locations:

    jar:file:/opt/service/apache-seatunnel-web-1.0.0-bin/libs/datasource-hive-1.0.0.jar!/org/eclipse/jetty/server/session/SessionHandler.class
    jar:file:/opt/service/apache-seatunnel-web-1.0.0-bin/libs/jetty-server-9.4.53.v20231009.jar!/org/eclipse/jetty/server/session/SessionHandler.class

这是因为/opt/service/apache-seatunnel-web-1.0.0-bin/libs/datasource-hive-1.0.0.jar和自带jar中jetty-server-9.4.53.v20231009.jar冲突，删除datasource-hive-1.0.0.jar可以解决，不过就用不了hive，需要hive的要另寻解决方案。

无法创建数据源

无法创建数据源的要确认下$SEATUNNEL_WEB_HOME/libs目录下是否有成功下载 datasource-*相关jar包

可以创建数据源但是无法选择source

我这边已经配置成功就没有那些不成功的截图了，可以参考一下几点逐步确认：

确认下$SEATUNNEL_HOME/lib目录下有没有datasource-* 相关jar包，这个我不确定是只需要在client节点的有就好了，还是seatunnel-engine节点也需要，我这里是所有节点都弄了
确认下$SEATUNNEL_HOME/connecotors/seatunnel目录下有没有相关connector，手动下载的注意要放到connectors/seatunnel目录下

确定下有没有正确配置一下环境变量并且生效

export SEATUNNEL_HOME=/opt/service/apache-seatunnel-2.3.3
export SEATUNNEL_WEB_HOME=/opt/service/apache-seatunnel-web-1.0.0-bin
export ST_WEB_BASEDIR_PATH=/opt/service/apache-seatunnel-web-1.0.0-bin/ui

export PATH=$PATH:$SEATUNNEL_HOME/bin:$SEATUNNEL_WEB_HOME/bin

如果担心不生效可以直接将这个配置加在$SEATUNNEL_WEB_HOME/bin/seatunnel-backend-daemon.sh行首，生效后日志会有一下信息：

[AbstractPluginDiscovery.<init>():113] - Load SeaTunnelSink Plugin from /opt/service/apache-seatunnel-2.3.3/connectors/seatunnel
 
 
[AbstractDataSourceClient.getCustomClassloader():225] - ST_WEB_BASEDIR_PATH is : /opt/service/apache-seatunnel-web-1.0.0-bin/ui

确定下$SEATUNNEL_HOME/lib下有没有 mysql-jdbc相关去驱动包，这个我不确定是否有影响，看到网上有说，我也有加这个，最终成功不知道与这个是否有关没有验证，如果前面几步确认了还不行可以试一下这个。

整库同步不可用

可以正常创建数据源，单表同步任务也可以选择source时，多表同步、整库同步不行，这个是因为没有配置cdc相关数据源，多表同步依赖这个配置。

最后送上我配置成功的截图：

其他

SeaTunnel Web这个项目从2023.10 发布1.0.0版本之后基本就没怎么看到有代码更新了，官方文档也比较糟糕，很多内容都是空的，github上连issues都没开，后续情况还是比较担忧的，作为apache的顶级项目有点意外。看用户群回复感觉短期也不会有新版本，生产环境使用还是慎重考虑，也不想总是换采集组件。

后续

使用seatunnel-web一段时间后发现真难用，token一过期退出登录都不行，只能清浏览器缓存，然后hadoop相关的东西如果seatunnel-web部署在非emr节点，各种缺包。

发现DolphinScheduler上有集成seatunnel的使用，安装好seatunnel后在DolphinScheduler上配置下环境变量就好了。决定放弃seatunnel-web！！！