rocketmq-connect 源码和实战

226 阅读6分钟

overview

rocketmq-connect 地位类似于 kafka-connect,是拓展mq使用边界的重要手段,比如从MySQL数据源同步数据到pqsql的Sink。

details

concepts

runtime

Runtime是Source ,Sink connector的运行时环境,负责加载Connector,提供RESTful接口,启动Connector任务,集群节点之间服务发现,配置同步,消费进度保存,故障转移,负载均衡等能力。

connector

connector相当于connect插件,是Source和Sink的完整实现

task(runnable)

Task是Connector任务分片的最小分配单位,是实际将源数据源数据复制数据到RocketMQ(SourceTask),或者将数据从RocketMQ读取数据写入到目标系统(SinkTask)真正的执行者,Task是无状态的可以动态的启停任务,多个Task是可以并行执行的,Connector复制数据的并行度主要体现在Task数量上。

converter(json)

方便source connector导入数据到mq

config

conf/connect-standalone.conf

解析数据源数据,并负责在数据源和RocketMQ Broker之间拷贝数据

worker(process)

REST 服务器,worker进程是Connector和Task运行环境,将获取到的配置传递给Connector和Task。 除此之外它还负责启动Connector和Task,保存Connector配置信息,保存Task同步数据的位点信息,负载均衡能力,Connect集群高可用,扩缩容,故障处理主要依赖Worker的负载均衡能力实现的。

modules

rocketmq-connect-runtime

StandaloneConnectStartup

  • 是对 AbstractConnectController 的简单封装,StandaloneConnectController 增加了StandaloneRebalanceService rebalanceService
  • 各个Service基本为Memory版本,Position为File版本
  • 增加的 RebalanceImpl 负责负载均衡,其中 getAllAliveWorkers 依赖的是sdk侧的获取所有消费者的接口
public class StandaloneConnectController extends AbstractConnectController {
    public void start() {
        super.start();
        rebalanceService.start();
    }
}

public class RebalanceImpl {
    public void doRebalance();
    private void updateProcessConfigsInRebalance(ConnAndTaskConfigs allocateResult);
}

DistributedConnectStartup

  • getClusterManagementService/getConfigManagementService等管理类都通过ServiceLoader机制加载
  • start 的逻辑是固定interval执行
    • this.configManagementService.persist()
    • this.positionManagementService.persist()
    • his.stateManagementService.persist()

ConnectController/AbstractConnectController

start流程

  • clusterManagementService.start();
  • async positionManagementService.start();
  • async stateManagementService.start();
  • async configManagementService.start();
  • connectStatsService.start();
  • worker.start();

ConnectController 实现

  • DistributedConnectController
  • StandaloneConnectController
public interface ConnectController {
    void start();
    void shutdown();
}

public abstract class AbstractConnectController implements ConnectController {
    protected final ConfigManagementService configManagementService;
    protected final PositionManagementService positionManagementService;
    protected final ClusterManagementService clusterManagementService;
    protected final StateManagementService stateManagementService;

    protected final Worker worker;
    protected final RestHandler restHandler;
    protected final Plugin plugin;

    protected final ConnectStatsManager connectStatsManager;
    protected final ConnectStatsService connectStatsService;
}

ConfigManagementService

public interface ConfigManagementService {}

public abstract class AbstractConfigManagementService implements ConfigManagementService, IChangeNotifier<String, byte[]>, ICommonConfiguration {}

PositionManagementService

  • Position的抽象是 openmessaging 的 RecordPartition: Map<String, ?>
  • AbstractPositionManagementService 的 positionStore 类型为 KeyValueStore<ExtendRecordPartition, RecordOffset>,ExtendRecordPartition 为 RecordPartition 增加了namespace
  • RecordOffset 的类型也为 Map<String, ?>
public interface PositionManagementService {}

public abstract class AbstractPositionManagementService implements PositionManagementService, IChangeNotifier<ByteBuffer, ByteBuffer>, ICommonConfiguration {}

public class RocketMqPositionManagementServiceImpl extends AbstractPositionManagementService {}

StateManagementService

  • 保存所有connector的状态
  • 保存一个connector对应的所有task的状态
public interface StateManagementService {}

public abstract class AbstractStateManagementService implements StateManagementService, IChangeNotifier<String, byte[]>, ICommonConfiguration {}

public class RocketMqStateManagementServiceImpl extends AbstractStateManagementService {}

ClusterManagementService

  • ClusterManagementServiceImpl 提供了 getCurrentWorker/getAllAliveWorkers,发现的方式是rocketmq DefaultMQPullConsumer 提供的方法
  • 还支持监听变化的机制,当 NOTIFY_CONSUMER_IDS_CHANGED 时,为每个 WorkerStatusListener 调用 onWorkerChange
protected KeyValueStore<ExtendRecordPartition, RecordOffset> positionStore;
public class ClusterManagementServiceImpl implements ClusterManagementService {}

handler

public RestHandler(AbstractConnectController connectController)

Query:

  • /getClusterInfo
  • /connectors/list
  • /allocated/connectors
  • /allocated/tasks
  • /connectors/{connectorName}/config
  • /connectors/{connectorName}/status
  • /connectors/{connectorName}/tasks
  • /connectors/{connectorName}/tasks/{task}/status

Create:

  • /connectors/{connectorName}
  • POST /connectors/{connectorName}

Stop:

  • /connectors/{connectorName}/stop
  • /connectors/stop/all

Pause & resume:

  • /connectors/{connectorName}/pause
  • /connectors/{connectorName}/resume
  • /connectors/pause/all
  • /connectors/resume/all

Plugin:

  • /plugin/reload
  • /plugin/list
  • /plugin/list/connectors
  • /plugin/config
  • /plugin/config/validate

store

public class FileBaseKeyValueStore<K, V> extends MemoryBasedKeyValueStore<K, V> {}

Worker

  • Worker 是个控制类,一对一归属于 ConnectController ,例如在startTask方法中,会根据 task instanceof,然后加入到sourceTaskOffsetCommitter和taskToFutureMap
  • Worker 的职责管理task,维护一个CachedThreadPool,在一个进程中调度所有connetors和tasks
  • maintainConnectorState 为空实现
  • maintainTaskState 进行一次cycle,重点在于根据connectorConfig创建出newTasks,然后startTask

public class Worker {
    private final ExecutorService taskExecutor; // Thread pool for connectors and tasks.

    public void start() {
        workerState = new AtomicReference<>(WorkerState.STARTED);
        stateMachineService.start();
    }

    private void startTask(Map<String, List<ConnectKeyValue>> newTasks) throws Exception {
        // use ClassLoader
        ClassLoader connectorLoader = plugin.delegatingLoader().connectorLoader(connType);
        // 根据 SourceTask/SinkTask 创建出对应的Task(producer/consumer)
    }

    public class StateMachineService extends ServiceThread {
        @Override
        public void run() {
            log.info(this.getServiceName() + " service started");
            while (!this.isStopped()) {
                this.waitForRunning(1000);
                try {
                    Worker.this.maintainConnectorState();
                    Worker.this.maintainTaskState();
                } catch (Exception e) {
                    log.error("RebalanceImpl#StateMachineService start connector or task failed", e);
                }
            }
            log.info(this.getServiceName() + " service end");
        }
    }
}

WorkerTask/WorkerSourceTask

  • WorkerTask 实现了Runnable,设定了一些hook(onPause...),实际执行的函数是抽象的 execute()
  • WorkerSourceTask: 不断地 sourceTask.poll(),然后执行SourceTask
  • WorkerSinkTask: iteration() -> pollConsumer -> 更新commit,sinkTask.start(taskConfig);发送在initializeAndStart
  • WorkerDirectTask: SinkTask和SourceTask的封装

SourceTask/SinkTask

openmessaging定义的抽象类

AllocateConnAndTaskStrategy

  • rebalance 依赖的策略模式,分为默认策略和一致性哈希
  • AllocateConnAndTaskStrategyByConsistentHash
  • DefaultAllocateConnAndTaskStrategy
public interface AllocateConnAndTaskStrategy {
    ConnAndTaskConfigs allocate(List<String> allWorker, String curWorker, Map<String, ConnectKeyValue> connectorConfigs, Map<String, List<ConnectKeyValue>> taskConfigs);
}

Plugin

  • 通过newTask创建出Task的一个classLoader
public class Plugin {
    private final DelegatingClassLoader delegatingLoader;

    public Task newTask(Class<? extends Task> taskClass) {
        return newPlugin(taskClass);
    }
}

ConnectStatsManager

IDE看不到引用,暂时不明用处。

public class ConnectStatsManager {
    private final HashMap<String, StatsItemSet> statsTable = new HashMap<String, StatsItemSet>();
    private final String worker;
}

ConnectStatsService

持续运行的采样线程,每次采样结果add到LinkedList中。

public class ConnectStatsService extends ServiceThread {
    private final LinkedList<CallSnapshot> sourceTaskTimesList = new LinkedList<CallSnapshot>();
    private final LinkedList<CallSnapshot> sinkTaskTimesList = new LinkedList<CallSnapshot>();
}

DataSynchronizer/BrokerBasedLog

  • DataSynchronizer 是个接口,除了 start, 还要求实现send(K, V),代表如何发送一个message
  • BrokerBasedLog 同时拥有一个consumer 和 producer

rocketmq-connect-cli

手搓的cli工具,命令的本质还是访问REST接口

public class ConnectAdminStartup {
    public static void initCommand(Config config) {
        initCommand(new CreateConnectorSubCommand(config));
        initCommand(new StopConnectorSubCommand(config));
        initCommand(new QueryConnectorConfigSubCommand(config));
        initCommand(new QueryConnectorStatusSubCommand(config));
        initCommand(new StopAllSubCommand(config));
        initCommand(new ReloadPluginsSubCommand(config));
        initCommand(new GetConfigInfoSubCommand(config));
        initCommand(new GetClusterInfoSubCommand(config));
        initCommand(new GetAllocatedInfoCommand(config));
        initCommand(new GetAllocatedConnectors(config));
        initCommand(new GetAllocatedTasks(config));
    }
}

distribution

打包并发布

metric-exporter

  • MetricsReporter 是符合Prometheus标准的Metrics收集器,方法为onGaugeAdded等
  • ConnectMetrics 管理了List<Reporter>,其中可能包括MetricsReporter,在Worker中作为一个成员被创建

practice

rocketmq-connect-sample

  • 单机模式下的运行样例,rocketmq-connect/README.md 给出了详细的运行样例,其中用到了rocketmq-connect-sample包下的类
  • 主要作用是从源文件中读取数据发送到RocketMQ集群 然后从Topic中读取消息,写入到目标文件

run

  • 保留端口:9876(namesrv), 8080(broker/proxy), 8082(connect)
  • 部署RocketMQ locally
  • 创建topic
  • 部署connect
  • 测试sample
  • 清理服务
# rocketmq
nohup sh bin/mqnamesrv &
tail -f ~/logs/rocketmqlogs/namesrv.log

nohup sh bin/mqbroker -n localhost:9876 &
tail -f ~/logs/rocketmqlogs/broker.log 

# topic
sh bin/mqadmin updateTopic -t fileTopic -n localhost:9876 -c DefaultCluster -r 8 -w 8


# connect
git clone https://github.com/apache/rocketmq-connect.git

cd  rocketmq-connect

mvn -Prelease-connect -DskipTests clean install -U

cd distribution/target/rocketmq-connect-0.0.1-SNAPSHOT/rocketmq-connect-0.0.1-SNAPSHOT

sh bin/connect-standalone.sh -c conf/connect-standalone.conf &

curl -X GET http://127.0.0.1:8082/getClusterInfo

curl -X GET http://127.0.0.1:8082/connectors/list

curl http://127.0.0.1:8082/connectors/fileSourceConnector/stop
curl http://127.0.0.1:8082/connectors/fileSinkConnector/stop

curl http://127.0.0.1:8082/connectors/stop/all

echo "Hello \r\nRocketMQ\r\n Connect" >> test-source-file.txt

curl -X POST -H "Content-Type: application/json" http://127.0.0.1:8082/connectors/fileSourceConnector -d '{"connector.class":"org.apache.rocketmq.connect.file.FileSourceConnector","filename":rocketmq-connect/test-source-file.txt","connect.topicname":"fileTopic"}'

curl -X POST -H "Content-Type: application/json" http://127.0.0.1:8082/connectors/fileSinkConnector -d '{"connector.class":"org.apache.rocketmq.connect.file.FileSinkConnector","filename":rocketmq-connect/test-sink-file.txt","connect.topicnames":"fileTopic"}'

# clean
sh bin/connectshutdown.sh

sh bin/mqshutdown broker
sh bin/mqshutdown namesrv

config

{
  "status": 200,
  "body": {
    "fileSinkConnector": {
      "status": {
        "name": "fileSinkConnector",
        "connector": {
          "state": "RUNNING",
          "trace": null,
          "workerId": "standalone-worker"
        },
        "tasks": [
          {
            "state": "RUNNING",
            "trace": null,
            "workerId": "standalone-worker",
            "id": 0
          }
        ],
        "type": "SINK"
      },
      "info": {
        "name": "fileSinkConnector",
        "config": {
          "connector.class": "org.apache.rocketmq.connect.file.FileSinkConnector",
          "filename": "test-sink-file.txt",
          "connect.topicnames": "fileTopic"
        },
        "tasks": [
          {
            "connector": "fileSinkConnector",
            "task": 0
          }
        ],
        "type": "SINK"
      }
    },
    "fileSourceConnector": {
      "status": {
        "name": "fileSourceConnector",
        "connector": {
          "state": "RUNNING",
          "trace": null,
          "workerId": "standalone-worker"
        },
        "tasks": [
          {
            "state": "UNASSIGNED",
            "trace": null,
            "workerId": "standalone-worker",
            "id": 0
          }
        ],
        "type": "SOURCE"
      },
      "info": {
        "name": "fileSourceConnector",
        "config": {
          "connector.class": "org.apache.rocketmq.connect.file.FileSourceConnector",
          "filename": "test-source-file.txt",
          "connect.topicname": "fileTopic"
        },
        "tasks": [
          {
            "connector": "fileSourceConnector",
            "task": 0
          }
        ],
        "type": "SOURCE"
      }
    }
  }
}

FileSinkConnector/FileSinkTask

  • FileSinkConnector 的 taskClass 方法指定了 FileSinkTask.class
  • FileSinkTask 从config中解析出fileConfig,然后创建出PrintStream用于输出到文件
  • 会更加复杂一点,它具有 offset 和 streamOffset, 是用于标识 buffer的偏移量的,streamOffset 表示整个文件历史上的偏移量

rocketmq-connect/connectors

  • 下面放了许多实现好的插件,只需要编译,然后将编译结果放入配置文件中的path下即可使用
  • 修改配置connect-standalone.conf,注意pluginPaths=/usr/local/connector-plugins

ref

RocketMQ Connect 技术架构解析

RocketMQ知识精讲与项目实战

quickstart

RocketMQ Connect 概览

RocketMQ Connect实战