特别提醒:当前Flink版本为:1.10
在一次启动Flink集群的过程中,内存参数方面采用下述配置:
taskmanager.memory.process.size: 1728m
taskmanager.memory.managed.size: 0m
taskmanager.memory.task.heap.size: 1024m
很遗憾,在执行start-cluster.sh
脚本的时候,报错。
鉴于这次错误,对Flink内存分布进行了简单理解。
如果有分析不当或者片面的地方,请在评论中指出,一起学习一起进步。
基础知识
上图是Flink TaskManager内存详细图。
Total Process Memory [taskmanager.memory.process.size]:用于声明分配给Flink JVM process 总共多少内存。主要用于容器化部署(如K8s、Yarn等),其对应于请求的容器的内存大小。
Total Flink Memory [taskmanager.memory.flink.size]:该参数更倾向于表示为Flink本身分配了多少内存。该参数主要用于Standalone部署方式。
Framework Heap[taskmanager.memory.framework.heap.size]:该参数为TaskExecutor进程的框架堆内存大小,默认值为128 mb。其中,该参数为高级参数,不建议随意修改。
Task Heap[taskmanager.memory.task.heap.size]:该参数为Flink任务能够使用堆内存大小。
Managed Memory[taskmanager.memory.managed.size]:该参数用于配置TaskExecutor的托管内存大小。主要用于批任务中的sorting、hash tables、caching of intermediate results,以及流任务的RocksDB状态后端。
Framework Off-heap[taskmanager.memory.framework.off-heap.size]:该参数用于配置TaskExecutor进程保留的堆外内存大小,默认值为128 mb。其中,该参数为高级参数,不建议随意修改。
Task Off-Heap[taskmanager.memory.task.off-heap.size]:该参数表示Flink任务能够使用堆外内存大小,默认为0 bytes。
Network:该内存主要用于保存shuffle数据,例如network buffers。与该部分内存相关的参数有:taskmanager.memory.network.fraction[默认值:0.1],taskmanager.memory.network.max[默认值:1 gb],taskmanager.memory.network.min[默认值:64 mb]。
JVM Metaspace[taskmanager.memory.jvm-metaspace.size]:该参数表示TaskExecutor进程的JVM Metaspace size。
JVM Overhead:该内存主要为JVM开销保留的内存,比如:thread stack space、compile cache等。与该部分内存相关的参数有:taskmanager.memory.jvm-overhead.fraction[默认值:0.1],taskmanager.memory.jvm-overhead.max[默认值:1 gb],taskmanager.memory.jvm-overhead.min[默认值:192 mb]。
最后,在启动TaskExecutor进程的时候,Flink根据配置或推倒的内存组件大小来配置与内存相关的JVM参数:
JVM Arguments | Value |
---|---|
-Xmx and -Xms | Framework Heap + Task Heap Memory |
-XX:MaxDirectMemorySize | Framework Off-Heap + Task Off-Heap + Network Memory |
-XX:MaxMetaspaceSize | JVM Metaspace |
上述JVM参数具体信息将在TaskExecutor启动的过程中以日志的方式打印出来,如图:
计算公式及源码分析
内存计算执行逻辑
# step1 start-cluster.sh
# Start TaskManager instance(s)
TMSlaves start # TMSlaves是config.sh中的方法
# step2 config.sh
# starts or stops TMs on all slaves
TMSlaves() {
...
if [[ $? -ne 0 ]]; then
for slave in ${SLAVES[@]}; do
ssh -n $FLINK_SSH_OPTS $slave -- "nohup /bin/bash -l \"${FLINK_BIN_DIR}/taskmanager.sh\" \"${CMD}\" &"
done
...
}
# step3 taskmanager.sh
jvm_params_output=$(runBashJavaUtilsCmd GET_TM_RESOURCE_JVM_PARAMS "${FLINK_CONF_DIR}" "$FLINK_BIN_DIR/bash-java-utils.jar:$(findFlinkDistJar)" "${ARGS[@]}")
dynamic_configs_output=$(runBashJavaUtilsCmd GET_TM_RESOURCE_DYNAMIC_CONFIGS ${FLINK_CONF_DIR} $FLINK_BIN_DIR/bash-java-utils.jar:$(findFlinkDistJar) "${ARGS[@]}")
# step4 config.sh
runBashJavaUtilsCmd() {
...
local output=`${JAVA_RUN} -classpath "${class_path}" org.apache.flink.runtime.util.BashJavaUtils ${cmd} --configDir "${conf_dir}" $dynamic_args 2>&1 | tail -n 1000`
...
echo "$output"
}
从上述脚本调用栈,我们可以发现,最后实际执行的是org.apache.flink.runtime.util.BashJavaUtils
类。
public class BashJavaUtils {
private static final String EXECUTION_PREFIX = "BASH_JAVA_UTILS_EXEC_RESULT:";
public static void main(String[] args) throws Exception {
checkArgument(args.length > 0, "Command not specified.");
switch (Command.valueOf(args[0])) {
case GET_TM_RESOURCE_DYNAMIC_CONFIGS:
getTmResourceDynamicConfigs(args);
break;
case GET_TM_RESOURCE_JVM_PARAMS:
getTmResourceJvmParams(args);
break;
default:
// unexpected, Command#valueOf should fail if a unknown command is passed in
throw new RuntimeException("Unexpected, something is wrong.");
}
}
private static void getTmResourceDynamicConfigs(String[] args) throws Exception {
Configuration configuration = getConfigurationForStandaloneTaskManagers(args);
TaskExecutorProcessSpec taskExecutorProcessSpec = TaskExecutorProcessUtils.processSpecFromConfig(configuration);
System.out.println(EXECUTION_PREFIX + TaskExecutorProcessUtils.generateDynamicConfigsStr(taskExecutorProcessSpec));
}
private static void getTmResourceJvmParams(String[] args) throws Exception {
Configuration configuration = getConfigurationForStandaloneTaskManagers(args);
TaskExecutorProcessSpec taskExecutorProcessSpec = TaskExecutorProcessUtils.processSpecFromConfig(configuration);
System.out.println(EXECUTION_PREFIX + TaskExecutorProcessUtils.generateJvmParametersStr(taskExecutorProcessSpec));
}
...
}
在BashJavaUtils
代码中,会根据当前求的是Jvm参数(GET_TM_RESOURCE_JVM_PARAMS),还是求的是Flink TaskExecutor内存组件(GET_TM_RESOURCE_DYNAMIC_CONFIGS),调用不同的方法。实际上,这两种情况调用的是同一套代码,执行的逻辑也是相同的(PS:可能是为了分别打印出与内存相关的JVM配置以及Flink TaskExecutor内存组件的内存大小)。
以getTmResourceDynamicConfigs()
为例,
private static void getTmResourceDynamicConfigs(String[] args) throws Exception {
# 兼容Flink 1.9内存配置的写法,这里不做介绍
Configuration configuration = getConfigurationForStandaloneTaskManagers(args);
# 根据flink-conf.yaml中的配置,解析与推倒内存分配
TaskExecutorProcessSpec taskExecutorProcessSpec = TaskExecutorProcessUtils.processSpecFromConfig(configuration);
System.out.println(EXECUTION_PREFIX + TaskExecutorProcessUtils.generateDynamicConfigsStr(taskExecutorProcessSpec));
}
TaskExecutorProcessUtils.processSpecFromConfig(configuration)
主要用于根据flink-conf.yaml
解析与推导 Flink TaskExecutor 内存组件的内存分配。
public static TaskExecutorProcessSpec processSpecFromConfig(final Configuration config) {
if (isTaskHeapMemorySizeExplicitlyConfigured(config) && isManagedMemorySizeExplicitlyConfigured(config)) {
// both task heap memory and managed memory are configured, use these to derive total flink memory
return deriveProcessSpecWithExplicitTaskAndManagedMemory(config);
} else if (isTotalFlinkMemorySizeExplicitlyConfigured(config)) {
// either of task heap memory and managed memory is not configured, total flink memory is configured,
// derive from total flink memory
return deriveProcessSpecWithTotalFlinkMemory(config);
} else if (isTotalProcessMemorySizeExplicitlyConfigured(config)) {
// total flink memory is not configured, total process memory is configured,
// derive from total process memory
return deriveProcessSpecWithTotalProcessMemory(config);
} else {
throw new IllegalConfigurationException(String.format("Either Task Heap Memory size (%s) and Managed Memory size (%s), or Total Flink"
+ " Memory size (%s), or Total Process Memory size (%s) need to be configured explicitly.",
TaskManagerOptions.TASK_HEAP_MEMORY.key(),
TaskManagerOptions.MANAGED_MEMORY_SIZE.key(),
TaskManagerOptions.TOTAL_FLINK_MEMORY.key(),
TaskManagerOptions.TOTAL_PROCESS_MEMORY.key()));
}
}
根据上述代码,我们可以清楚的发现,Flink内存组件的计算是根据用户的配置来决定的,主要分为以下三种情况(优先级自上而下):
- 设置 task heap memory 和 managed memory,推算 total flink memory内存
- 指定 total flink memory,推算 managed memory 和 network memory 以及 task heap memory
- 指定 total flink memory 和 task heap memory,推算 managed memory 和 network memory
- 指定 total process memory,推算 jvm-overhead、managed memory、network memory、task memory。
内存配置与计算
设置 task heap memory 和 managed memory,推算 network memory 和 total flink memory内存
- step1 从配置中读取 task heap memory、managed memory、framework heap、framework off-heap、task off-heap,然后求和,记为
totalFlinkExcludeNetworkMemorySize
;
final MemorySize taskHeapMemorySize = getTaskHeapMemorySize(config);
final MemorySize managedMemorySize = getManagedMemorySize(config);
final MemorySize frameworkHeapMemorySize = getFrameworkHeapMemorySize(config); // 128m
final MemorySize frameworkOffHeapMemorySize = getFrameworkOffHeapMemorySize(config); // 128m
final MemorySize taskOffHeapMemorySize = getTaskOffHeapMemorySize(config); // 0m
final MemorySize networkMemorySize;
final MemorySize totalFlinkExcludeNetworkMemorySize =
frameworkHeapMemorySize.add(frameworkOffHeapMemorySize).add(taskHeapMemorySize).add(taskOffHeapMemorySize).add(managedMemorySize);
- step2 计算 network size
- step2.1 在配置 task heap memory 和 managed memory 基础上,额外配置 total flink memory 内存。这个时候,获得 total flink memory 值,即可推算network memory:total flink memory - totalFlinkExcludeNetworkMemorySize;
// derive network memory from total flink memory, and check against network min/max
final MemorySize totalFlinkMemorySize = getTotalFlinkMemorySize(config);
networkMemorySize = totalFlinkMemorySize.subtract(totalFlinkExcludeNetworkMemorySize);
- step2.2 配置 task heap memory 和 managed memory,但是没有配置 total flink memory,这个时候 network memory 计算方式也发生了改变:totalFlinkExcludeNetworkMemorySize * (network.fraction / (1 - netowrk.fraction))。最后,推算network memory size 是否 处于 network.min 与 network.max 之间,否则取最值;
final MemorySize relative = base.multiply(rangeFraction.fraction / (1 - rangeFraction.fraction));
capToMinMax(memoryDescription, relative, rangeFraction);
- step3 计算 jvm-overhead
- step3.1 在上述基础上,额外配置了 total process memory。在这种情况下,从配置中获得 jvm metaspace 与 total process memory,即可推算 jvm-overhead:total process memory - (total flink memory + jvm metaspace);
final MemorySize jvmMetaspaceSize = getJvmMetaspaceSize(config);
final MemorySize totalFlinkAndJvmMetaspaceSize = totalFlinkMemorySize.add(jvmMetaspaceSize);
final MemorySize jvmOverheadSize = totalProcessMemorySize.subtract(totalFlinkAndJvmMetaspaceSize);
- step3.2 当然,也可能没有额外配置 total process memory。在这种情况下,从配置中获得 jvm metaspace,通过计算公式求的 jvm-overhead:(total flink memory + jvm metaspace) * (jvm-overhead.fraction/(1 - jvm-overhead.fraction))。最后,判断 jvm overhead 大小是否处于 jvm overhead.min 与 jvm-overhead.max 之间,否则取最值。
final MemorySize jvmMetaspaceSize = getJvmMetaspaceSize(config);
final MemorySize totalFlinkAndJvmMetaspaceSize = totalFlinkMemorySize.add(jvmMetaspaceSize);
final MemorySize jvmOverheadSize = deriveJvmOverheadWithInverseFraction(config, totalFlinkAndJvmMetaspaceSize);
指定 total flink memory,推算 managed memory 和 network memory
- step1 从配置中获得 total flink memory
final MemorySize totalFlinkMemorySize = getTotalFlinkMemorySize(config);
- step2 计算 managed memory 和 network memory
- step2.1 如果额外指定 task heap memory,从配置中读取 framework heap、framework off-heap、task off-heap、task heap等内存值,然后计算 managed memory 与 netowrk memory(见注释);
final MemorySize frameworkHeapMemorySize = getFrameworkHeapMemorySize(config);
final MemorySize frameworkOffHeapMemorySize = getFrameworkOffHeapMemorySize(config);
final MemorySize taskOffHeapMemorySize = getTaskOffHeapMemorySize(config);
final MemorySize taskHeapMemorySize = getTaskHeapMemorySize(config);
# 如果配置了 managed memory size, 则从配置中取出 managed memeory值; 如果没有,则使用fraction计算;
final MemorySize managedMemorySize = deriveManagedMemoryAbsoluteOrWithFraction(config, totalFlinkMemorySize);
final MemorySize totalFlinkExcludeNetworkMemorySize =
frameworkHeapMemorySize.add(frameworkOffHeapMemorySize).add(taskHeapMemorySize).add(taskOffHeapMemorySize).add(managedMemorySize);
# network memory = total flink memory - totalFlinkExcludeNetworkMemorySize
final MemorySize networkMemorySize = totalFlinkMemorySize.subtract(totalFlinkExcludeNetworkMemorySize);
- step2.2 如果没有额外指定 task heap memory,通过 managed.fraction 计算 managed memory,通过 network.fraction 计算 network memory。[managed memory = total flink memory * managed.faction;network memory = total flink memory * network.fraction]。最后计算 task heap memory。
managedMemorySize = deriveManagedMemoryAbsoluteOrWithFraction(config, totalFlinkMemorySize);
networkMemorySize = deriveNetworkMemoryWithFraction(config, totalFlinkMemorySize);
final MemorySize totalFlinkExcludeTaskHeapMemorySize =
frameworkHeapMemorySize.add(frameworkOffHeapMemorySize).add(taskOffHeapMemorySize).add(managedMemorySize).add(networkMemorySize);
taskHeapMemorySize = totalFlinkMemorySize.subtract(totalFlinkExcludeTaskHeapMemorySize);
- step3 计算 jvm-overhead
- step3.1 如果配置了 total process memory,则jvm-overhead = total process memory - (total flink memory + jvm metaspace)。
final MemorySize jvmMetaspaceSize = getJvmMetaspaceSize(config);
final MemorySize totalFlinkAndJvmMetaspaceSize = totalFlinkMemorySize.add(jvmMetaspaceSize);
final MemorySize totalProcessMemorySize = getTotalProcessMemorySize(config);
final MemorySize jvmOverheadSize = totalProcessMemorySize.subtract(totalFlinkAndJvmMetaspaceSize);
- step3.2 如果没有指定 total process memory,则通过如下计算公式获得jvm-overhead:(total flink memory + jvm metaspace) * (jvm-overhead.fraction / (1 - jvm-overhead.fraction))。
final MemorySize jvmMetaspaceSize = getJvmMetaspaceSize(config);
final MemorySize totalFlinkAndJvmMetaspaceSize = totalFlinkMemorySize.add(jvmMetaspaceSize);
final MemorySize jvmOverheadSize = deriveJvmOverheadWithInverseFraction(config, totalFlinkAndJvmMetaspaceSize);
指定 total process memory,推算 jvm-overhead、managed memory、network memory、task heap memory
- step1 从配置中获得 total process memory,jvm metaspace,并根据jvm-overhead.fraction 获得 jvm-overhead [jvm-overhead = total process memory * jvm-overhead.fraction]。
final MemorySize totalProcessMemorySize = getTotalProcessMemorySize(config);
final MemorySize jvmMetaspaceSize = getJvmMetaspaceSize(config);
final MemorySize jvmOverheadSize = deriveJvmOverheadWithFraction(config, totalProcessMemorySize);
- step2 推导 total flink memory [total flink memory = total process memory - jvm metaspace - jvm-overhead]。
final MemorySize totalFlinkMemorySize = totalProcessMemorySize.subtract(jvmMetaspaceAndOverhead.getTotalJvmMetaspaceAndOverheadSize());
- step3 先从配置中 获取framework heap、framework off-heap、task off-heap值,紧接着推导 flink 内存组件的内存大小。
final MemorySize frameworkHeapMemorySize = getFrameworkHeapMemorySize(config);
final MemorySize frameworkOffHeapMemorySize = getFrameworkOffHeapMemorySize(config);
final MemorySize taskOffHeapMemorySize = getTaskOffHeapMemorySize(config);
- step3.1 如果额外指定了 task heap memory,从配置中获得 task heap memory,然后 推导 managed memory,最后计算network memory。(见注释)
taskHeapMemorySize = getTaskHeapMemorySize(config);
# 如果配置了 managed size, 则直接取值;如果没有配置,则使用fraction计算得到:total flink memory * managed.fraction
managedMemorySize = deriveManagedMemoryAbsoluteOrWithFraction(config, totalFlinkMemorySize);
final MemorySize totalFlinkExcludeNetworkMemorySize =
frameworkHeapMemorySize.add(frameworkOffHeapMemorySize).add(taskHeapMemorySize).add(taskOffHeapMemorySize).add(managedMemorySize);
# network = total flink memory - totalFlinkExcludeNetworkMemorySize
networkMemorySize = totalFlinkMemorySize.subtract(totalFlinkExcludeNetworkMemorySize);
- step3.2 如果没有配置 task heap memory,先推导 managed memory、network memory,最后得到 tash heap memory。(见注释)
# 如果配置了 managed size, 则直接取值;如果没有配置,则使用fraction计算得到:total flink memory * managed.fraction
managedMemorySize = deriveManagedMemoryAbsoluteOrWithFraction(config, totalFlinkMemorySize);
# 如果配置了 network size, 则直接取值;如果没有配置,则使用fraction计算得到:total flink memory * network.fraction
networkMemorySize = deriveNetworkMemoryWithFraction(config, totalFlinkMemorySize);
final MemorySize totalFlinkExcludeTaskHeapMemorySize =
frameworkHeapMemorySize.add(frameworkOffHeapMemorySize).add(taskOffHeapMemorySize).add(managedMemorySize).add(networkMemorySize);
# task heap memory = total flink memory - totalFlinkExcludeTaskHeapMemorySize;
taskHeapMemorySize = totalFlinkMemorySize.subtract(totalFlinkExcludeTaskHeapMemorySize);