启动脚本的解析
在SPARK_HOME/sbin下有以下几个重要的启动脚本:
- spark-config.sh
- spark-daemon.sh
- start-all.sh
- start-master.sh
- start-slave.sh
- stop-xxx.sh
- ....
每个start-开头的脚本都是用户常用的组件启动、管理spark的脚本。
本着学习源码的想法,重点研究start-master.sh这个脚本都做了什么。start-slave.sh不做过多说明
start-master.sh
#!/usr/bin/env bash
# Starts the master on the machine this script is executed on.
if [ -z "${SPARK_HOME}" ]; then
export SPARK_HOME="$(cd "`dirname "$0"`"/..; pwd)"
fi
# NOTE: This exact class name is matched downstream by SparkSubmit.
# Any changes need to be reflected there.
CLASS="org.apache.spark.deploy.master.Master"
if [[ "$@" = *--help ]] || [[ "$@" = *-h ]]; then
echo "Usage: ./sbin/start-master.sh [options]"
pattern="Usage:"
pattern+="\|Using Spark's default log4j profile:"
pattern+="\|Registered signal handlers for"
"${SPARK_HOME}"/bin/spark-class $CLASS --help 2>&1 | grep -v "$pattern" 1>&2
exit 1
fi
ORIGINAL_ARGS="$@"
. "${SPARK_HOME}/sbin/spark-config.sh"
. "${SPARK_HOME}/bin/load-spark-env.sh"
if [ "$SPARK_MASTER_PORT" = "" ]; then
SPARK_MASTER_PORT=7077
fi
if [ "$SPARK_MASTER_HOST" = "" ]; then
case `uname` in
(SunOS)
SPARK_MASTER_HOST="`/usr/sbin/check-hostname | awk '{print $NF}'`"
;;
(*)
SPARK_MASTER_HOST="`hostname -f`"
;;
esac
fi
if [ "$SPARK_MASTER_WEBUI_PORT" = "" ]; then
SPARK_MASTER_WEBUI_PORT=8080
fi
echo $0 '最终运行以下命令:'
echo "-------------------------"
echo ${SPARK_HOME}/sbin/spark-daemon.sh start $CLASS 1 --host $SPARK_MASTER_HOST --port $SPARK_MASTER_PORT --webui-port $SPARK_MASTER_WEBUI_PORT $ORIGINAL_ARGS
echo "-------------------------"
echo ""
echo ""
"${SPARK_HOME}/sbin"/spark-daemon.sh start $CLASS 1 \
--host $SPARK_MASTER_HOST --port $SPARK_MASTER_PORT --webui-port $SPARK_MASTER_WEBUI_PORT \
$ORIGINAL_ARGS
通过在shell脚本中添加额外的打印信息,可以看出,start-master.sh脚本的作用主要有:
- 调用sbin/spark-config.sh和bin/load-spark-env.sh设置了一些基础的环境变量
- 设置SPARK_MASTER_PORT、SPARK_MASTER_HOST、SPARK_MASTER_WEBUI_PORT等变量
- 调用sbin/spark-daemon.sh来执行org.apache.spark.deploy.master.Master这个类中的main方法
接下来简单的看一下spark-daemon.sh
spark-daemon.sh
以下为脚本代码(添加的有额外的打印信息,建议阅读spark最原始的代码)
#!/usr/bin/env bash
# Runs a Spark command as a daemon.
#
# Environment Variables
#
# SPARK_CONF_DIR Alternate conf dir. Default is ${SPARK_HOME}/conf.
# SPARK_LOG_DIR Where log files are stored. ${SPARK_HOME}/logs by default.
# SPARK_MASTER host:path where spark code should be rsync'd from
# SPARK_PID_DIR The pid files are stored. /tmp by default.
# SPARK_IDENT_STRING A string representing this instance of spark. $USER by default
# SPARK_NICENESS The scheduling priority for daemons. Defaults to 0.
# SPARK_NO_DAEMONIZE If set, will run the proposed command in the foreground. It will not output a PID file.
##
echo ""
echo $0 "被调用"
usage="Usage: spark-daemon.sh [--config <conf-dir>] (start|stop|submit|status) <spark-command> <spark-instance-number> <args...>"
# if no args specified, show usage
if [ $# -le 1 ]; then
echo $usage
exit 1
fi
if [ -z "${SPARK_HOME}" ]; then
export SPARK_HOME="$(cd "`dirname "$0"`"/..; pwd)"
fi
. "${SPARK_HOME}/sbin/spark-config.sh"
# get arguments
# Check if --config is passed as an argument. It is an optional parameter.
# Exit if the argument is not a directory.
if [ "$1" == "--config" ]
then
shift
conf_dir="$1"
if [ ! -d "$conf_dir" ]
then
echo "ERROR : $conf_dir is not a directory"
echo $usage
exit 1
else
export SPARK_CONF_DIR="$conf_dir"
fi
shift
fi
option=$1
shift
command=$1
shift
instance=$1
shift
echo "option:$option"
echo "command:$command"
echo "instance:$instance"
echo "参数:$@"
spark_rotate_log ()
{
log=$1;
num=5;
if [ -n "$2" ]; then
num=$2
fi
if [ -f "$log" ]; then # rotate logs
while [ $num -gt 1 ]; do
prev=`expr $num - 1`
[ -f "$log.$prev" ] && mv "$log.$prev" "$log.$num"
num=$prev
done
mv "$log" "$log.$num";
fi
}
. "${SPARK_HOME}/bin/load-spark-env.sh"
if [ "$SPARK_IDENT_STRING" = "" ]; then
export SPARK_IDENT_STRING="$USER"
fi
export SPARK_PRINT_LAUNCH_COMMAND="1"
# get log directory
if [ "$SPARK_LOG_DIR" = "" ]; then
export SPARK_LOG_DIR="${SPARK_HOME}/logs"
fi
mkdir -p "$SPARK_LOG_DIR"
touch "$SPARK_LOG_DIR"/.spark_test > /dev/null 2>&1
TEST_LOG_DIR=$?
if [ "${TEST_LOG_DIR}" = "0" ]; then
rm -f "$SPARK_LOG_DIR"/.spark_test
else
chown "$SPARK_IDENT_STRING" "$SPARK_LOG_DIR"
fi
if [ "$SPARK_PID_DIR" = "" ]; then
SPARK_PID_DIR=/tmp
fi
# some variables
log="$SPARK_LOG_DIR/spark-$SPARK_IDENT_STRING-$command-$instance-$HOSTNAME.out"
pid="$SPARK_PID_DIR/spark-$SPARK_IDENT_STRING-$command-$instance.pid"
# Set default scheduling priority
if [ "$SPARK_NICENESS" = "" ]; then
export SPARK_NICENESS=0
fi
execute_command() {
echo ""
if [ -z ${SPARK_NO_DAEMONIZE+set} ]; then
echo "原本后台执行命令"
echo "-------------------------"
echo "nohup -- "$@" >> $log 2>&1 < /dev/null &"
# nohup -- "$@" >> $log 2>&1 < /dev/null &
echo "-------------------------"
"$@" &
newpid="$!"
echo "newpid:$newpid"
echo "$newpid" > "$pid"
# Poll for up to 5 seconds for the java process to start
for i in {1..10}
do
if [[ $(ps -p "$newpid" -o comm=) =~ "java" ]]; then
break
fi
sleep 0.5
done
sleep 2
# Check if the process has died; in that case we'll tail the log so the user can see
if [[ ! $(ps -p "$newpid" -o comm=) =~ "java" ]]; then
echo "failed to launch: $@"
tail -10 "$log" | sed 's/^/ /'
echo "full log in $log"
fi
else
"$@"
fi
}
run_command() {
mode="$1"
shift
mkdir -p "$SPARK_PID_DIR"
if [ -f "$pid" ]; then
TARGET_ID="$(cat "$pid")"
if [[ $(ps -p "$TARGET_ID" -o comm=) =~ "java" ]]; then
echo "$command running as process $TARGET_ID. Stop it first."
exit 1
fi
fi
if [ "$SPARK_MASTER" != "" ]; then
echo rsync from "$SPARK_MASTER"
rsync -a -e ssh --delete --exclude=.svn --exclude='logs/*' --exclude='contrib/hod/logs/*' "$SPARK_MASTER/" "${SPARK_HOME}"
fi
spark_rotate_log "$log"
echo "starting $command, logging to $log"
echo ""
echo "->>mode为:$mode"
case "$mode" in
(class)
echo "-------------------------"
echo execute_command nice -n "$SPARK_NICENESS" "${SPARK_HOME}"/bin/spark-class "$command" "$@"
execute_command nice -n "$SPARK_NICENESS" "${SPARK_HOME}"/bin/spark-class "$command" "$@"
echo "-------------------------"
;;
(submit)
echo "-------------------------"
echo execute_command nice -n "$SPARK_NICENESS" bash "${SPARK_HOME}"/bin/spark-submit --class "$command" "$@"
execute_command nice -n "$SPARK_NICENESS" bash "${SPARK_HOME}"/bin/spark-submit --class "$command" "$@"
echo "-------------------------"
;;
(*)
echo "unknown mode: $mode"
exit 1
;;
esac
}
echo ""
echo "-> option" $option
case $option in
(submit)
run_command submit "$@"
;;
(start)
run_command class "$@"
;;
(stop)
if [ -f $pid ]; then
TARGET_ID="$(cat "$pid")"
if [[ $(ps -p "$TARGET_ID" -o comm=) =~ "java" ]]; then
echo "stopping $command"
kill "$TARGET_ID" && rm -f "$pid"
else
echo "no $command to stop"
fi
else
echo "no $command to stop"
fi
;;
(status)
if [ -f $pid ]; then
TARGET_ID="$(cat "$pid")"
if [[ $(ps -p "$TARGET_ID" -o comm=) =~ "java" ]]; then
echo $command is running.
exit 0
else
echo $pid file is present but $command not running
exit 1
fi
else
echo $command not running.
exit 2
fi
;;
(*)
echo $usage
exit 1
;;
esac
本脚本做了以下的事情:
- 设置了一些环境变量
- SPARK_LOG_DIR
- SPARK_NICENESS
- 定义了shell函数
- execute_command 用于具体的使用nohup执行对应的命令
- run_command 对应预处理调用本脚本的命令、调用 execute_command来执行命令
- 判断调用调用本脚本的参数,执行不同的操作
- submit 一般用来提交任务
- start 一般用来启动进程组件
- 。。。其他的先略过 通过阅读本脚本可以得知,,本脚本最终是通过调用shell命令nohup 来将进程放入后台运行,达到daemon的目的
本脚本中其他的关于日志转储、进程pid记录、进程状态监控等,暂时不讨论,阅读源码中要抓住重点,有的放矢,不能一口吃成个胖子。
启动master进程,重新从start-master.sh开始
在阅读源码的过程中,笔者最喜欢将源码按照自己的思路运行起来,最好能脱离脚本,直达.java代码文件。这样不仅能清楚的跟踪源码每一步的执行过程,更能加深印象。接下来就追寻master进程启动的步骤,一步一步的查看都调用了执行了那些脚本、执行了哪些命令。
1.执行sbin/start-master.sh
sbin/start-master.sh
通过启动添加了额外输出代码的脚本,可以看出start-master.sh最终执行的命令是:
被调用了start-daemon.sh
spark-daemon.sh
通过打印脚本执行过程的信息,可以清晰的看出,最终调用了nohup
注意:在自己修改源代码的时候,一定不要在
nohup -- "$@" >> $log 2>&1 < /dev/null &
newpid="$!"
之间添加其他的东西,因为$!的值取决与上一个命令的返回值。
这个脚本最终会调用以下命令:
/Users/didi/Develop/spark-2.3.3-bin-hadoop2.6/bin/spark-class org.apache.spark.deploy.master.Master --host localhost --port 7077 --webui-port 8080 >> /Users/didi/Develop/spark-2.3.3-bin-hadoop2.6/logs/spark-didi-org.apache.spark.deploy.master.Master-1-localhost.out 2>&1 < /dev/null &
bin/spark-class脚本
如果启动其他的脚本,或使用其他的脚本,会发现,最终都是调用了这个脚本,来作为一个中转(命令转换器?),得到最终的可以执行的命令。
看一下脚本的内容:解析看以下内容的注释
#!/usr/bin/env bash
# 标识脚本开始执行
echo "=============enter spark-class"
if [ -z "${SPARK_HOME}" ]; then
source "$(dirname "$0")"/find-spark-home
fi
. "${SPARK_HOME}"/bin/load-spark-env.sh
# 找到java的路径,要开始使用java xxx的形式开始执行代码了
# Find the java binary
if [ -n "${JAVA_HOME}" ]; then
RUNNER="${JAVA_HOME}/bin/java"
else
if [ "$(command -v java)" ]; then
RUNNER="java"
else
echo "JAVA_HOME is not set" >&2
exit 1
fi
fi
# 找到spark的jar的路径,这一步十分重要,很多ClassNotFound的Exception都是因为jars路径找不到或者设置不正确因为的
# Find Spark jars.
if [ -d "${SPARK_HOME}/jars" ]; then
SPARK_JARS_DIR="${SPARK_HOME}/jars"
else
SPARK_JARS_DIR="${SPARK_HOME}/assembly/target/scala-$SPARK_SCALA_VERSION/jars"
fi
if [ ! -d "$SPARK_JARS_DIR" ] && [ -z "$SPARK_TESTING$SPARK_SQL_TESTING" ]; then
echo "Failed to find Spark jars directory ($SPARK_JARS_DIR)." 1>&2
echo "You need to build Spark with the target \"package\" before running this program." 1>&2
exit 1
else
LAUNCH_CLASSPATH="$SPARK_JARS_DIR/*"
fi
# Add the launcher build dir to the classpath if requested.
if [ -n "$SPARK_PREPEND_CLASSES" ]; then
LAUNCH_CLASSPATH="${SPARK_HOME}/launcher/target/scala-$SPARK_SCALA_VERSION/classes:$LAUNCH_CLASSPATH"
fi
# For tests
if [[ -n "$SPARK_TESTING" ]]; then
unset YARN_CONF_DIR
unset HADOOP_CONF_DIR
fi
# The launcher library will print arguments separated by a NULL character, to allow arguments with
# characters that would be otherwise interpreted by the shell. Read that in a while loop, populating
# an array that will be used to exec the final command.
#
# The exit code of the launcher is appended to the output, so the parent shell removes it from the
# command array and checks the value to see if the launcher succeeded.
tempMessage=()
build_command() {
tempMessage+="$RUNNER" -Xmx128m -cp "$LAUNCH_CLASSPATH" org.apache.spark.launcher.Main "$@"
"$RUNNER" -Xmx128m -cp "$LAUNCH_CLASSPATH" org.apache.spark.launcher.Main "$@"
printf "%d\0" $?
}
echo ${tempMessage[@]}
#以下代码是重点:
# Turn off posix mode since it does not allow process substitution
set +o posix
CMD=()
while IFS= read -d '' -r ARG; do
CMD+=("$ARG")
done < <(build_command "$@")
# 这个while中的done << (build_command "$@")的含义是:
# 1.执行 build_command 函数
# 2.将build_command函数的结果重定向到这个while循环中:使用了<< 符号
# 3.使用 read 命令 循环读取函数返回的结果,while遍历则是将循环读取的结果放入到CMD这个数组中
echo "8888888"
echo "${CMD[@]}"
echo "8888888"
COUNT=${#CMD[@]}
LAST=$((COUNT - 1))
LAUNCHER_EXIT_CODE=${CMD[$LAST]}
# Certain JVM failures result in errors being printed to stdout (instead of stderr), which causes
# the code that parses the output of the launcher to get confused. In those cases, check if the
# exit code is an integer, and if it's not, handle it as a special error case.
if ! [[ $LAUNCHER_EXIT_CODE =~ ^[0-9]+$ ]]; then
echo "${CMD[@]}" | head -n-1 1>&2
exit 1
fi
if [ $LAUNCHER_EXIT_CODE != 0 ]; then
exit $LAUNCHER_EXIT_CODE
fi
# 下面的语句执行之后,CMD中将只包含我们需要的命令,不包含build_command的返回值了
CMD=("${CMD[@]:0:$LAST}")
echo "在设置环境变量后,可以手动执行"
echo "最终执行的CMD"
echo "${CMD[@]}"
echo "---------"
### 最终通过exec命令,调用组装好的命令,启动Master进程
exec "${CMD[@]}"
经过以上的脚本,得到了最终要执行的命令
/Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home/jre/bin/java -cp /Users/didi/Develop/spark-2.3.3-bin-hadoop2.6/conf/:/Users/didi/Develop/spark-2.3.3-bin-hadoop2.6/jars/* -Xmx1g org.apache.spark.deploy.master.Master --host localhost --port 7077 --webui-port 8080
化简一下
java -cp xxxxx:xxx:xxxxx -Xmx1g org.apache.spark.deploy.master.Master --host localhost --port 7077 --webui-port 8080
也就是用java开启动Master这个类,运行其中的main。
注意,此时还是在shell中运行的命令,即使使用了java xxx类名的形式,进程也还是在shell中,只不过使用exec命令,将当前的进程内容替换为java进程,并保持pid不变。
源码运行到这里,脚本已经做了它该做的全部事情:设置环境变量、日志处理、进程pid监控等,接下来就可以完全脱离脚本,在IDEA中配置一个run configuration,来模拟Master等启动,达到完全相同等效果
在IDEA中完全模拟启动Master进程
可以在上一步过程的spark-class脚本最后,将全部的环境变量打印保持在文件中,然后在IDEA的run configuration的时候配置模拟即可
我的配置如下:
配置细节:
VM option
Enviroment Variables
当配置完毕之后,就可以舍弃sbin下的start-master.sh,而使用IDEA的run,一键运行Master进行,并可以进行debug,一步一步的运行。
小结
Master的启动,涉及到的主要脚本是:
- 用户运行start-master.sh
- 调用spark-daemon.sh
- 调用spark-class
- 运行org.apache.spark.launcher.Mai,得到最终要执行的命令 CMD
- 运行最终的命令CMD
- 最终的CMD为一个java的程序命令:
/xxxx/java -cp /Users/didi/Develop/spark-2.3.3-bin-hadoop2.6/conf/:/Users/didi/Develop/spark-2.3.3-bin-hadoop2.6/jars/* -Xmx1g org.apache.spark.deploy.master.Master --host localhost --port 7077 --webui-port 8080
在分析完启动脚本的流程作用之后,就知道了最终启动的是org.apache.spark.deploy.master.Master这个类,那么这个类中必定会有main方法,在IDEA中查看,果然有main方法。
下一篇将会分析Master中的main(),以及Spark中重要的Master节点的环境是怎么一步步建立起来的,并且结合本系列的最初的目的之一:了解Spark中的RPC框架,来揭示RPC。·