overview
rocketmq-connect 地位类似于 kafka-connect,是拓展mq使用边界的重要手段,比如从MySQL数据源同步数据到pqsql的Sink。
details
concepts
runtime
Runtime是Source ,Sink connector的运行时环境,负责加载Connector,提供RESTful接口,启动Connector任务,集群节点之间服务发现,配置同步,消费进度保存,故障转移,负载均衡等能力。
connector
connector相当于connect插件,是Source和Sink的完整实现
task(runnable)
Task是Connector任务分片的最小分配单位,是实际将源数据源数据复制数据到RocketMQ(SourceTask),或者将数据从RocketMQ读取数据写入到目标系统(SinkTask)真正的执行者,Task是无状态的可以动态的启停任务,多个Task是可以并行执行的,Connector复制数据的并行度主要体现在Task数量上。
converter(json)
方便source connector导入数据到mq
config
conf/connect-standalone.conf
解析数据源数据,并负责在数据源和RocketMQ Broker之间拷贝数据
worker(process)
REST 服务器,worker进程是Connector和Task运行环境,将获取到的配置传递给Connector和Task。 除此之外它还负责启动Connector和Task,保存Connector配置信息,保存Task同步数据的位点信息,负载均衡能力,Connect集群高可用,扩缩容,故障处理主要依赖Worker的负载均衡能力实现的。
modules
rocketmq-connect-runtime
StandaloneConnectStartup
- 是对 AbstractConnectController 的简单封装,StandaloneConnectController 增加了
StandaloneRebalanceService rebalanceService - 各个Service基本为Memory版本,Position为File版本
- 增加的 RebalanceImpl 负责负载均衡,其中 getAllAliveWorkers 依赖的是sdk侧的获取所有消费者的接口
public class StandaloneConnectController extends AbstractConnectController {
public void start() {
super.start();
rebalanceService.start();
}
}
public class RebalanceImpl {
public void doRebalance();
private void updateProcessConfigsInRebalance(ConnAndTaskConfigs allocateResult);
}
DistributedConnectStartup
- getClusterManagementService/getConfigManagementService等管理类都通过ServiceLoader机制加载
- start 的逻辑是固定interval执行
- this.configManagementService.persist()
- this.positionManagementService.persist()
- his.stateManagementService.persist()
ConnectController/AbstractConnectController
start流程
- clusterManagementService.start();
- async positionManagementService.start();
- async stateManagementService.start();
- async configManagementService.start();
- connectStatsService.start();
- worker.start();
ConnectController 实现
- DistributedConnectController
- StandaloneConnectController
public interface ConnectController {
void start();
void shutdown();
}
public abstract class AbstractConnectController implements ConnectController {
protected final ConfigManagementService configManagementService;
protected final PositionManagementService positionManagementService;
protected final ClusterManagementService clusterManagementService;
protected final StateManagementService stateManagementService;
protected final Worker worker;
protected final RestHandler restHandler;
protected final Plugin plugin;
protected final ConnectStatsManager connectStatsManager;
protected final ConnectStatsService connectStatsService;
}
ConfigManagementService
public interface ConfigManagementService {}
public abstract class AbstractConfigManagementService implements ConfigManagementService, IChangeNotifier<String, byte[]>, ICommonConfiguration {}
PositionManagementService
- Position的抽象是 openmessaging 的 RecordPartition: Map<String, ?>
- AbstractPositionManagementService 的 positionStore 类型为
KeyValueStore<ExtendRecordPartition, RecordOffset>,ExtendRecordPartition 为 RecordPartition 增加了namespace - RecordOffset 的类型也为 Map<String, ?>
public interface PositionManagementService {}
public abstract class AbstractPositionManagementService implements PositionManagementService, IChangeNotifier<ByteBuffer, ByteBuffer>, ICommonConfiguration {}
public class RocketMqPositionManagementServiceImpl extends AbstractPositionManagementService {}
StateManagementService
- 保存所有connector的状态
- 保存一个connector对应的所有task的状态
public interface StateManagementService {}
public abstract class AbstractStateManagementService implements StateManagementService, IChangeNotifier<String, byte[]>, ICommonConfiguration {}
public class RocketMqStateManagementServiceImpl extends AbstractStateManagementService {}
ClusterManagementService
- ClusterManagementServiceImpl 提供了 getCurrentWorker/getAllAliveWorkers,发现的方式是rocketmq DefaultMQPullConsumer 提供的方法
- 还支持监听变化的机制,当 NOTIFY_CONSUMER_IDS_CHANGED 时,为每个 WorkerStatusListener 调用 onWorkerChange
protected KeyValueStore<ExtendRecordPartition, RecordOffset> positionStore;
public class ClusterManagementServiceImpl implements ClusterManagementService {}
handler
public RestHandler(AbstractConnectController connectController)
Query:
- /getClusterInfo
- /connectors/list
- /allocated/connectors
- /allocated/tasks
- /connectors/{connectorName}/config
- /connectors/{connectorName}/status
- /connectors/{connectorName}/tasks
- /connectors/{connectorName}/tasks/{task}/status
Create:
- /connectors/{connectorName}
- POST /connectors/{connectorName}
Stop:
- /connectors/{connectorName}/stop
- /connectors/stop/all
Pause & resume:
- /connectors/{connectorName}/pause
- /connectors/{connectorName}/resume
- /connectors/pause/all
- /connectors/resume/all
Plugin:
- /plugin/reload
- /plugin/list
- /plugin/list/connectors
- /plugin/config
- /plugin/config/validate
store
public class FileBaseKeyValueStore<K, V> extends MemoryBasedKeyValueStore<K, V> {}
Worker
- Worker 是个控制类,一对一归属于 ConnectController ,例如在startTask方法中,会根据
task instanceof,然后加入到sourceTaskOffsetCommitter和taskToFutureMap - Worker 的职责管理task,维护一个CachedThreadPool,在一个进程中调度所有connetors和tasks
- maintainConnectorState 为空实现
- maintainTaskState 进行一次cycle,重点在于根据connectorConfig创建出newTasks,然后startTask
public class Worker {
private final ExecutorService taskExecutor; // Thread pool for connectors and tasks.
public void start() {
workerState = new AtomicReference<>(WorkerState.STARTED);
stateMachineService.start();
}
private void startTask(Map<String, List<ConnectKeyValue>> newTasks) throws Exception {
// use ClassLoader
ClassLoader connectorLoader = plugin.delegatingLoader().connectorLoader(connType);
// 根据 SourceTask/SinkTask 创建出对应的Task(producer/consumer)
}
public class StateMachineService extends ServiceThread {
@Override
public void run() {
log.info(this.getServiceName() + " service started");
while (!this.isStopped()) {
this.waitForRunning(1000);
try {
Worker.this.maintainConnectorState();
Worker.this.maintainTaskState();
} catch (Exception e) {
log.error("RebalanceImpl#StateMachineService start connector or task failed", e);
}
}
log.info(this.getServiceName() + " service end");
}
}
}
WorkerTask/WorkerSourceTask
- WorkerTask 实现了Runnable,设定了一些hook(onPause...),实际执行的函数是抽象的 execute()
- WorkerSourceTask: 不断地 sourceTask.poll(),然后执行SourceTask
- WorkerSinkTask: iteration() -> pollConsumer -> 更新commit,
sinkTask.start(taskConfig);发送在initializeAndStart - WorkerDirectTask: SinkTask和SourceTask的封装
SourceTask/SinkTask
openmessaging定义的抽象类
AllocateConnAndTaskStrategy
- rebalance 依赖的策略模式,分为默认策略和一致性哈希
- AllocateConnAndTaskStrategyByConsistentHash
- DefaultAllocateConnAndTaskStrategy
public interface AllocateConnAndTaskStrategy {
ConnAndTaskConfigs allocate(List<String> allWorker, String curWorker, Map<String, ConnectKeyValue> connectorConfigs, Map<String, List<ConnectKeyValue>> taskConfigs);
}
Plugin
- 通过newTask创建出Task的一个classLoader
public class Plugin {
private final DelegatingClassLoader delegatingLoader;
public Task newTask(Class<? extends Task> taskClass) {
return newPlugin(taskClass);
}
}
ConnectStatsManager
IDE看不到引用,暂时不明用处。
public class ConnectStatsManager {
private final HashMap<String, StatsItemSet> statsTable = new HashMap<String, StatsItemSet>();
private final String worker;
}
ConnectStatsService
持续运行的采样线程,每次采样结果add到LinkedList中。
public class ConnectStatsService extends ServiceThread {
private final LinkedList<CallSnapshot> sourceTaskTimesList = new LinkedList<CallSnapshot>();
private final LinkedList<CallSnapshot> sinkTaskTimesList = new LinkedList<CallSnapshot>();
}
DataSynchronizer/BrokerBasedLog
- DataSynchronizer 是个接口,除了 start, 还要求实现send(K, V),代表如何发送一个message
- BrokerBasedLog 同时拥有一个consumer 和 producer
rocketmq-connect-cli
手搓的cli工具,命令的本质还是访问REST接口
public class ConnectAdminStartup {
public static void initCommand(Config config) {
initCommand(new CreateConnectorSubCommand(config));
initCommand(new StopConnectorSubCommand(config));
initCommand(new QueryConnectorConfigSubCommand(config));
initCommand(new QueryConnectorStatusSubCommand(config));
initCommand(new StopAllSubCommand(config));
initCommand(new ReloadPluginsSubCommand(config));
initCommand(new GetConfigInfoSubCommand(config));
initCommand(new GetClusterInfoSubCommand(config));
initCommand(new GetAllocatedInfoCommand(config));
initCommand(new GetAllocatedConnectors(config));
initCommand(new GetAllocatedTasks(config));
}
}
distribution
打包并发布
metric-exporter
- MetricsReporter 是符合Prometheus标准的Metrics收集器,方法为onGaugeAdded等
- ConnectMetrics 管理了
List<Reporter>,其中可能包括MetricsReporter,在Worker中作为一个成员被创建
practice
rocketmq-connect-sample
- 单机模式下的运行样例,
rocketmq-connect/README.md给出了详细的运行样例,其中用到了rocketmq-connect-sample包下的类 - 主要作用是从源文件中读取数据发送到RocketMQ集群 然后从Topic中读取消息,写入到目标文件
run
- 保留端口:9876(namesrv), 8080(broker/proxy), 8082(connect)
- 部署RocketMQ locally
- 创建topic
- 部署connect
- 测试sample
- 清理服务
# rocketmq
nohup sh bin/mqnamesrv &
tail -f ~/logs/rocketmqlogs/namesrv.log
nohup sh bin/mqbroker -n localhost:9876 &
tail -f ~/logs/rocketmqlogs/broker.log
# topic
sh bin/mqadmin updateTopic -t fileTopic -n localhost:9876 -c DefaultCluster -r 8 -w 8
# connect
git clone https://github.com/apache/rocketmq-connect.git
cd rocketmq-connect
mvn -Prelease-connect -DskipTests clean install -U
cd distribution/target/rocketmq-connect-0.0.1-SNAPSHOT/rocketmq-connect-0.0.1-SNAPSHOT
sh bin/connect-standalone.sh -c conf/connect-standalone.conf &
curl -X GET http://127.0.0.1:8082/getClusterInfo
curl -X GET http://127.0.0.1:8082/connectors/list
curl http://127.0.0.1:8082/connectors/fileSourceConnector/stop
curl http://127.0.0.1:8082/connectors/fileSinkConnector/stop
curl http://127.0.0.1:8082/connectors/stop/all
echo "Hello \r\nRocketMQ\r\n Connect" >> test-source-file.txt
curl -X POST -H "Content-Type: application/json" http://127.0.0.1:8082/connectors/fileSourceConnector -d '{"connector.class":"org.apache.rocketmq.connect.file.FileSourceConnector","filename":rocketmq-connect/test-source-file.txt","connect.topicname":"fileTopic"}'
curl -X POST -H "Content-Type: application/json" http://127.0.0.1:8082/connectors/fileSinkConnector -d '{"connector.class":"org.apache.rocketmq.connect.file.FileSinkConnector","filename":rocketmq-connect/test-sink-file.txt","connect.topicnames":"fileTopic"}'
# clean
sh bin/connectshutdown.sh
sh bin/mqshutdown broker
sh bin/mqshutdown namesrv
config
{
"status": 200,
"body": {
"fileSinkConnector": {
"status": {
"name": "fileSinkConnector",
"connector": {
"state": "RUNNING",
"trace": null,
"workerId": "standalone-worker"
},
"tasks": [
{
"state": "RUNNING",
"trace": null,
"workerId": "standalone-worker",
"id": 0
}
],
"type": "SINK"
},
"info": {
"name": "fileSinkConnector",
"config": {
"connector.class": "org.apache.rocketmq.connect.file.FileSinkConnector",
"filename": "test-sink-file.txt",
"connect.topicnames": "fileTopic"
},
"tasks": [
{
"connector": "fileSinkConnector",
"task": 0
}
],
"type": "SINK"
}
},
"fileSourceConnector": {
"status": {
"name": "fileSourceConnector",
"connector": {
"state": "RUNNING",
"trace": null,
"workerId": "standalone-worker"
},
"tasks": [
{
"state": "UNASSIGNED",
"trace": null,
"workerId": "standalone-worker",
"id": 0
}
],
"type": "SOURCE"
},
"info": {
"name": "fileSourceConnector",
"config": {
"connector.class": "org.apache.rocketmq.connect.file.FileSourceConnector",
"filename": "test-source-file.txt",
"connect.topicname": "fileTopic"
},
"tasks": [
{
"connector": "fileSourceConnector",
"task": 0
}
],
"type": "SOURCE"
}
}
}
}
FileSinkConnector/FileSinkTask
- FileSinkConnector 的 taskClass 方法指定了 FileSinkTask.class
- FileSinkTask 从config中解析出fileConfig,然后创建出PrintStream用于输出到文件
- 会更加复杂一点,它具有 offset 和 streamOffset, 是用于标识 buffer的偏移量的,streamOffset 表示整个文件历史上的偏移量
rocketmq-connect/connectors
- 下面放了许多实现好的插件,只需要编译,然后将编译结果放入配置文件中的path下即可使用
- 修改配置connect-standalone.conf,注意
pluginPaths=/usr/local/connector-plugins