Spark on Yarn集群环境搭建及example运行一篇搞定

1,242 阅读2分钟

环境准备

  • 机器准备 Linux(CentOS7)虚拟机

    10.58.12.170
    10.58.12.171
    10.58.10.129
    tdops
    
  • 软件版本

    • jdk 1.8.0_60
    • scala 2.11.12
    • hadoop 3.1.3
    • spark 2.4.6
    • livy 0.7.0

配置hosts

  • sudo vim /etc/hosts
    // 添加如下host配置
    10.58.12.171 ailoan-vip-d-012171.hz.td
    10.58.12.170 ailoan-vip-d-012170.hz.td
    10.58.10.129 ailoan-vip-d-010129.hz.td
    

配置三台机器免密登录

  1. 安装openssh-server

    sudo yum install openssh-server
    
  2. 生成公钥

    ssh-keygen -t rsa # 一路回车
    
  3. 公钥互相追加到authorized_keys

  4. 测试是否成功

    // 170机器上执行如下,如果成功表示配置完成
    170 > ssh ailoan-vip-d-012171.hz.td 
    

安装JDK

安装Scala

  1. 下载安装包

    downloads.lightbend.com/scala/2.11.…
  2. 将安装包拷贝到目标机器

    scp {username}@localip:/Users/{username}/Downloads/大数据软件/scala-2.11.12.tgz /usr/install/bigdata
    
  3. 解压到目标文件

    sudo tar -zxvf scala-2.11.12.tgz .
    
  4. 配置环境变量

    # 编辑环境变量
    sudo vim /etc/profile
    
    # 添加如下配置
    export SCALA_HOME=/usr/install/bigdata/scala-2.11.12
    export PATH=$PATH:$JAVA_HOME/bin:$SCALA_HOME/bin
    
    # 使配置生效
    source /etc/profile
    
  5. 测试安装结果

    scala -version
    # 输出以下内容表示成功
    Welcome to Scala 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_60).
    

安装Hadoop

  1. 下载软件包

    cd /usr/install/bigdata
    
    wget https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-3.1.3/hadoop-3.1.3.tar.gz
    
  2. 解压到指定目录

    sudo tar -zxvf hadoop-3.1.3.tar.gz 
    
  3. 给hadoop目录制定用户

    cd /usr/install/bigdata
    
    # 改变用户及用户组,使得启动hadoop无障碍
    sudo chown -R tdops:users hadoop-3.1.3
    
  4. 配置环境变量及应用配置

    • 配置hadoop-env.sh
    # 跳转到hadoop安装目录的配置目录下
    cd /usr/install/bigdata/hadoop-3.1.3/etc/hadoop
    
    vim hadoop-env.sh
    
    # 添加jdk主目录
    export JAVA_HOME=/usr/install/jdk1.8.0_60
    export HADOOP_LOG_DIR=/home/tdops/spark/hadoop-3.1.3/logs
    
    • 配置yarn-evn.sh
    vim yarn-env.sh
    
    # 添加jdk主目录配置
    export JAVA_HOME=/usr/install/jdk1.8.0_60
    
    • 配置core-site.xml
    <configuration>
        <property>
            <name>fs.defaultFS</name>
            <!--master的hostname-->
            <value>hdfs://ailoan-vip-d-012170.hz.td:9000/</value>
        </property>
        <property>
             <name>hadoop.tmp.dir</name>
             <value>/home/tdops/spark/hadoop-3.1.3/tmp</value>
        </property>
    </configuration>
    
    
    • 配置hdfs-site.xml
    <configuration>
        <property>
            <name>dfs.namenode.secondary.http-address</name>
            <value>ailoan-vip-d-012170.hz.td:50090</value>
        </property>
        <property>
            <name>dfs.namenode.name.dir</name>
            <value>/home/tdops/spark/hadoop-3.1.3/dfs/name</value>
        </property>
        <property>
            <name>dfs.datanode.data.dir</name>
            <value>/home/tdops/spark/hadoop-3.1.3/dfs/data</value>
        </property>
        <property>
            <name>dfs.replication</name>
            <value>3</value>
        </property>
    </configuration>
    
    
    • 配置mapred-site.xml
    <configuration>
        <property>
            <name>mapreduce.framework.name</name>
            <value>yarn</value>
        </property>
    </configuration>
    
    • 配置yarn-site.xml
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
    <!--这个配置临时加入解决yarn模式运行demo异常终止的问题
    https://stackoverflow.com/questions/41468833/why-does-spark-exit-with-exitcode-16
    -->
    <property>
        <name>yarn.nodemanager.vmem-check-enabled</name>
        <value>false</value>
    </property>
    <property>
        <name>yarn.resourcemanager.address</name>
        <value>ailoan-vip-d-012170.hz.td:8032</value>
    </property>
    <property>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>ailoan-vip-d-012170.hz.td:8030</value>
    </property>
    <property>
        <name>yarn.resourcemanager.resource-tracker.address</name>
        <value>ailoan-vip-d-012170.hz.td:8031</value>
    </property>
    <property>
        <name>yarn.resourcemanager.admin.address</name>
        <value>ailoan-vip-d-012170.hz.td:8033</value>
    </property>
    <property>
        <name>yarn.resourcemanager.webapp.address</name>
        <value>ailoan-vip-d-012170.hz.td:8090</value>
    </property>
    
  5. 验证安装结果

安装Spark

  1. 下载spark包

    • 官网下载spark.apache.org/downloads.h…
    • 并上传到准备的服务上
    • scp {username}@localip:/Users/{username}/Downloads/大数据软件/spark-2.4.6-bin-hadoop2.7.tgz /usr/install/bigdata
  2. 解压到指定目录

    sudo tar -zxvf spark-2.4.6-bin-hadoop2.7.tgz
    
  3. 修改所属用户及组

    sudo chown -R tdops:users spark-2.4.6-bin-hadoop2.7
    
  4. 配置环境变量及属性

    • 配置$SPARK_HOME/conf下spark-env.sh
    export SCALA_HOME=/usr/install/bigdata/scala-2.11.12
    export JAVA_HOME=/usr/install/jdk1.8.0_60
    export HADOOP_HOME=/usr/install/bigdata/hadoop-3.1.3
    export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
    export SPARK_LOG_DIR=/home/tdops/spark/spark-2.4.6/logs
    
    SPARK_MASTER_IP=ailoan-vip-d-012170.hz.td
    SPARK_LOCAL_DIRS=/usr/install/bigdata/spark-2.4.6-bin-hadoop2.7
    SPARK_DRIVER_MEMORY=512M
    
    • 配置$SPARK_HOME/conf下slaves文件
    // 将准备好的两台机器host作为salve
    ailoan-vip-d-012171.hz.td
    ailoan-vip-d-010129.hz.td
    
  5. 启动spark

    cd $SPARK_HOME/sbin
    
    ./start-all.sh
    
  6. 验证安装结果

    • ailoan-vip-d-012170.hz.td:8080/
    • jps 查看是否master有master进程,slave有worker进程
    • 执行如下官网demo程序
       ./spark-submit --master spark://ailoan-vip-d-012170.hz.td:7077 --class org.apache.spark.examples.SparkPi --deploy-mode cluster file:/tmp/spark-examples_2.11-2.4.6.jar
      
    • 在webUI端查看spark执行结果如下图 截屏2020-09-10 上午11.46.01.png 截屏2020-09-10 上午11.46.28.png 截屏2020-09-10 上午11.47.09.png

安装livy

  1. 下载软件包

    # 直接wget下载
    wget https://www.apache.org/dyn/closer.lua/incubator/livy/0.7.0-incubating/apache-livy-0.7.0-incubating-bin.zip
    
  2. 解压到指定目录

    // 下载安装unzip解压指令软件
    sudo yum install unzip
    
    // 解压软件
    unzip apache-livy-0.7.0-incubating-bin.zip apache-livy-0.7.0
    
  3. 配置文件修改

    • 添加并配置livy-env.sh
    // 拷贝模板配置
    sudo cp livy-env.sh.template livy-env.sh
    
    // 编辑配置文件,添加如下内容
    sudo vim livy-env.sh
    
    JAVA_HOME=/usr/install/jdk1.8.0_60
    HADOOP_CONF_DIR=/usr/install/bigdata/hadoop-3.1.3/etc/hadoop
    SPARK_HOME=/usr/install/bigdata/spark-2.4.6-bin-hadoop2.7
    LIVY_LOG_DIR=/home/tdops/spark/livy/logs
    
    • 添加并配置livy.conf
    // 拷贝模板配置
    sudo cp livy.conf.template livy.conf
    
    // 编辑配置文件,添加如下内容
    sudo vim livy.conf
    
    livy.spark.deploy-mode=cluster
    livy.spark.master=spark://ailoan-vip-d-012170.hz.td:7077
    livy.file.local-dir-whitelist=/tmp
    
  4. 启动livy

    cd $LIVY_HOME/bin
    
    ./livy-server start
    
  5. 查看并验证livy

    package cn.xxx.yuntu.common.util.livy.core;
    
    
    import cn.xxx.yuntu.common.util.dto.ApiResult;
    import cn.xxx.yuntu.common.util.livy.vo.LivyArg;
    import cn.xxx.yuntu.common.util.livy.vo.LivyStatus;
    import cn.xxx.yuntu.common.util.livy.vo.LivyResult;
    import cn.xxx.yuntu.common.util.util.HttpUtil;
    import cn.xxx.yuntu.common.util.util.LogUtil;
    import com.alibaba.fastjson.JSON;
    import com.alibaba.fastjson.JSONArray;
    import com.alibaba.fastjson.JSONObject;
    import org.apache.commons.lang3.StringUtils;
    
    import java.util.ArrayList;
    import java.util.HashMap;
    import java.util.List;
    import java.util.Map;
    
    /**
     * @author li.minqiang
     * @date 2019/12/5
     */
    public class LivyClient {
    
        public static final String DELETED = "deleted";
        public static final String LIVY_BATCH_URI = "%s/batches/%s";
        public static final String MSG = "msg";
        public static final String NOT_FOUND = "not found";
        public static final String SESSION = "Session";
        private final static String STARTING = "starting";
        private LivyArg livyArg;
    
        private LivyClient() {
        }
    
        public static LivyClient getInstance(LivyArg livyArg) {
            LivyClient livyClient = new LivyClient();
            livyClient.setLivyArg(livyArg);
            return livyClient;
        }
    
    
        public ApiResult submitSparkJar() {
            JSONObject data = new JSONObject();
            data.put("file", livyArg.getJarPath());
            data.put("className", livyArg.getClassName());
            data.put("name", "testLivy" + System.currentTimeMillis());
            data.put("executorCores", livyArg.getExecutorCores());
            data.put("executorMemory", livyArg.getExecutorMemory());
            data.put("driverCores", livyArg.getDriverCores());
            data.put("driverMemory", livyArg.getDriverMemory());
            data.put("numExecutors", livyArg.getNumExecutors());
            data.put("conf", livyArg.getConf());
            data.put("args", livyArg.getArgs());
    
            ApiResult apiResult = HttpUtil.postJson(String.format("%s/batches", livyArg.getLivyServer()), data);
    
            JSONObject obj = (JSONObject) apiResult.getResult();
    
            if (!apiResult.isSuccess() || StringUtils.isEmpty(obj.getString("state"))) {
                LogUtil.error("make livy request error:{}", JSON.toJSONString(apiResult));
                return ApiResult.failure("提交livy任务失败");
            }
    
            LogUtil.info("livy submit result:{}", JSON.toJSONString(apiResult, true));
            return ApiResult.successWithResult(obj.getString("id"));
        }
    
        private String getLivyUrl() {
            return String.format("%s/ui/batch/%s/log", livyArg.getLivyServer(), livyArg.getTaskId());
        }
    
        private List<String> makeListLogs(JSONArray logs) {
            List<String> mlogs = new ArrayList<String>();
            if (logs == null) {
                return mlogs;
            }
            for (int i = 0; i < logs.size(); i++) {
                mlogs.add(logs.getString(i));
            }
            return mlogs;
        }
    
        private List<String> getLivyServerLogs() {
            String url = String.format("%s/batches/%s/log?size=-1", livyArg.getLivyServer(), livyArg.getTaskId());
            ApiResult apiResult = HttpUtil.get(url);
            if (apiResult.isSuccess()) {
                JSONObject r = (JSONObject) apiResult.getResult();
                return makeListLogs(r.getJSONArray("log"));
            }
            return new ArrayList<String>();
        }
    
        public LivyArg getLivyArg() {
            return livyArg;
        }
    
        public void setLivyArg(LivyArg livyArg) {
            this.livyArg = livyArg;
        }
    
        public static void main(String[] args) {
    //        JavaWordCount();
            SparkPi();
        }
    
        public static void SparkPi() {
            LivyClient livyClient = new LivyClient();
            LivyArg livyArg = new LivyArg();
            livyArg.setLivyServer("http://ailoan-vip-d-012170.hz.td:8998");
            livyArg.setJarPath("hdfs://ailoan-vip-d-012170.hz.td:9000/example/spark-examples_2.11-2.4.6.jar");
            livyArg.setClassName("org.apache.spark.examples.SparkPi");
            livyArg.setExecutorCores(1);
            livyArg.setDriverCores(1);
            livyArg.setExecutorMemory("512M");
            livyArg.setDriverMemory("512M");
    
            livyClient.setLivyArg(livyArg);
    
            ApiResult apiResult = livyClient.submitSparkJar();
            System.out.println(apiResult.getCode() + "-" + apiResult.getReason());
            System.out.println(apiResult);
        }
    
        public static void JavaWordCount() {
            LivyClient livyClient = new LivyClient();
            LivyArg livyArg = new LivyArg();
            livyArg.setLivyServer("http://ailoan-vip-d-012170.hz.td/:8998");
            livyArg.setJarPath("hdfs://ailoan-vip-d-012170.hz.td:9000/example/spark-examples_2.11-2.4.6.jar");
            livyArg.setClassName("org.apache.spark.examples.JavaWordCount");
            livyArg.setExecutorCores(1);
            livyArg.setDriverCores(1);
            livyArg.setExecutorMemory("512M");
            livyArg.setDriverMemory("512M");
            List<String> args = new ArrayList<>();
            args.add("hdfs://ailoan-vip-d-012170.hz.td:9000/example/wordCount.txt");
            livyArg.setArgs(args);
    
            livyClient.setLivyArg(livyArg);
    
            ApiResult apiResult = livyClient.submitSparkJar();
            System.out.println(apiResult.getCode() + "-" + apiResult.getReason());
            System.out.println(apiResult);
        }
    }
    
    
    • 同样查看sparkWebUI查看提交记录及结果