环境准备
-
机器准备 Linux(CentOS7)虚拟机
10.58.12.170 10.58.12.171 10.58.10.129 tdops -
软件版本
- jdk 1.8.0_60
- scala 2.11.12
- hadoop 3.1.3
- spark 2.4.6
- livy 0.7.0
配置hosts
- sudo vim /etc/hosts
// 添加如下host配置 10.58.12.171 ailoan-vip-d-012171.hz.td 10.58.12.170 ailoan-vip-d-012170.hz.td 10.58.10.129 ailoan-vip-d-010129.hz.td
配置三台机器免密登录
-
安装openssh-server
sudo yum install openssh-server -
生成公钥
ssh-keygen -t rsa # 一路回车 -
公钥互相追加到authorized_keys
-
测试是否成功
// 170机器上执行如下,如果成功表示配置完成 170 > ssh ailoan-vip-d-012171.hz.td
安装JDK
安装Scala
-
下载安装包
downloads.lightbend.com/scala/2.11.… -
将安装包拷贝到目标机器
scp {username}@localip:/Users/{username}/Downloads/大数据软件/scala-2.11.12.tgz /usr/install/bigdata -
解压到目标文件
sudo tar -zxvf scala-2.11.12.tgz . -
配置环境变量
# 编辑环境变量 sudo vim /etc/profile # 添加如下配置 export SCALA_HOME=/usr/install/bigdata/scala-2.11.12 export PATH=$PATH:$JAVA_HOME/bin:$SCALA_HOME/bin # 使配置生效 source /etc/profile -
测试安装结果
scala -version # 输出以下内容表示成功 Welcome to Scala 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_60).
安装Hadoop
-
下载软件包
cd /usr/install/bigdata wget https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-3.1.3/hadoop-3.1.3.tar.gz -
解压到指定目录
sudo tar -zxvf hadoop-3.1.3.tar.gz -
给hadoop目录制定用户
cd /usr/install/bigdata # 改变用户及用户组,使得启动hadoop无障碍 sudo chown -R tdops:users hadoop-3.1.3 -
配置环境变量及应用配置
- 配置hadoop-env.sh
# 跳转到hadoop安装目录的配置目录下 cd /usr/install/bigdata/hadoop-3.1.3/etc/hadoop vim hadoop-env.sh # 添加jdk主目录 export JAVA_HOME=/usr/install/jdk1.8.0_60 export HADOOP_LOG_DIR=/home/tdops/spark/hadoop-3.1.3/logs- 配置yarn-evn.sh
vim yarn-env.sh # 添加jdk主目录配置 export JAVA_HOME=/usr/install/jdk1.8.0_60- 配置core-site.xml
<configuration> <property> <name>fs.defaultFS</name> <!--master的hostname--> <value>hdfs://ailoan-vip-d-012170.hz.td:9000/</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/home/tdops/spark/hadoop-3.1.3/tmp</value> </property> </configuration>- 配置hdfs-site.xml
<configuration> <property> <name>dfs.namenode.secondary.http-address</name> <value>ailoan-vip-d-012170.hz.td:50090</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>/home/tdops/spark/hadoop-3.1.3/dfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/home/tdops/spark/hadoop-3.1.3/dfs/data</value> </property> <property> <name>dfs.replication</name> <value>3</value> </property> </configuration>- 配置mapred-site.xml
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>- 配置yarn-site.xml
<property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <!--这个配置临时加入解决yarn模式运行demo异常终止的问题 https://stackoverflow.com/questions/41468833/why-does-spark-exit-with-exitcode-16 --> <property> <name>yarn.nodemanager.vmem-check-enabled</name> <value>false</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>ailoan-vip-d-012170.hz.td:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>ailoan-vip-d-012170.hz.td:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>ailoan-vip-d-012170.hz.td:8031</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>ailoan-vip-d-012170.hz.td:8033</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>ailoan-vip-d-012170.hz.td:8090</value> </property> -
验证安装结果
- 访问ailoan-vip-d-012170.hz.td:8090如下界面
- 访问hadoop hdfs webUI解密地址:ailoan-vip-d-012170.hz.td:9870/
安装Spark
-
下载spark包
- 官网下载spark.apache.org/downloads.h…
- 并上传到准备的服务上
scp {username}@localip:/Users/{username}/Downloads/大数据软件/spark-2.4.6-bin-hadoop2.7.tgz /usr/install/bigdata
-
解压到指定目录
sudo tar -zxvf spark-2.4.6-bin-hadoop2.7.tgz -
修改所属用户及组
sudo chown -R tdops:users spark-2.4.6-bin-hadoop2.7 -
配置环境变量及属性
- 配置$SPARK_HOME/conf下spark-env.sh
export SCALA_HOME=/usr/install/bigdata/scala-2.11.12 export JAVA_HOME=/usr/install/jdk1.8.0_60 export HADOOP_HOME=/usr/install/bigdata/hadoop-3.1.3 export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop export SPARK_LOG_DIR=/home/tdops/spark/spark-2.4.6/logs SPARK_MASTER_IP=ailoan-vip-d-012170.hz.td SPARK_LOCAL_DIRS=/usr/install/bigdata/spark-2.4.6-bin-hadoop2.7 SPARK_DRIVER_MEMORY=512M- 配置$SPARK_HOME/conf下slaves文件
// 将准备好的两台机器host作为salve ailoan-vip-d-012171.hz.td ailoan-vip-d-010129.hz.td -
启动spark
cd $SPARK_HOME/sbin ./start-all.sh -
验证安装结果
- ailoan-vip-d-012170.hz.td:8080/
- jps 查看是否master有master进程,slave有worker进程
- 执行如下官网demo程序
./spark-submit --master spark://ailoan-vip-d-012170.hz.td:7077 --class org.apache.spark.examples.SparkPi --deploy-mode cluster file:/tmp/spark-examples_2.11-2.4.6.jar - 在webUI端查看spark执行结果如下图
安装livy
-
下载软件包
# 直接wget下载 wget https://www.apache.org/dyn/closer.lua/incubator/livy/0.7.0-incubating/apache-livy-0.7.0-incubating-bin.zip -
解压到指定目录
// 下载安装unzip解压指令软件 sudo yum install unzip // 解压软件 unzip apache-livy-0.7.0-incubating-bin.zip apache-livy-0.7.0 -
配置文件修改
- 添加并配置livy-env.sh
// 拷贝模板配置 sudo cp livy-env.sh.template livy-env.sh // 编辑配置文件,添加如下内容 sudo vim livy-env.sh JAVA_HOME=/usr/install/jdk1.8.0_60 HADOOP_CONF_DIR=/usr/install/bigdata/hadoop-3.1.3/etc/hadoop SPARK_HOME=/usr/install/bigdata/spark-2.4.6-bin-hadoop2.7 LIVY_LOG_DIR=/home/tdops/spark/livy/logs- 添加并配置livy.conf
// 拷贝模板配置 sudo cp livy.conf.template livy.conf // 编辑配置文件,添加如下内容 sudo vim livy.conf livy.spark.deploy-mode=cluster livy.spark.master=spark://ailoan-vip-d-012170.hz.td:7077 livy.file.local-dir-whitelist=/tmp -
启动livy
cd $LIVY_HOME/bin ./livy-server start -
查看并验证livy
- 浏览器打开ailoan-vip-d-012170.hz.td:8998
- 使用postman或者java代码执行一段demo
package cn.xxx.yuntu.common.util.livy.core; import cn.xxx.yuntu.common.util.dto.ApiResult; import cn.xxx.yuntu.common.util.livy.vo.LivyArg; import cn.xxx.yuntu.common.util.livy.vo.LivyStatus; import cn.xxx.yuntu.common.util.livy.vo.LivyResult; import cn.xxx.yuntu.common.util.util.HttpUtil; import cn.xxx.yuntu.common.util.util.LogUtil; import com.alibaba.fastjson.JSON; import com.alibaba.fastjson.JSONArray; import com.alibaba.fastjson.JSONObject; import org.apache.commons.lang3.StringUtils; import java.util.ArrayList; import java.util.HashMap; import java.util.List; import java.util.Map; /** * @author li.minqiang * @date 2019/12/5 */ public class LivyClient { public static final String DELETED = "deleted"; public static final String LIVY_BATCH_URI = "%s/batches/%s"; public static final String MSG = "msg"; public static final String NOT_FOUND = "not found"; public static final String SESSION = "Session"; private final static String STARTING = "starting"; private LivyArg livyArg; private LivyClient() { } public static LivyClient getInstance(LivyArg livyArg) { LivyClient livyClient = new LivyClient(); livyClient.setLivyArg(livyArg); return livyClient; } public ApiResult submitSparkJar() { JSONObject data = new JSONObject(); data.put("file", livyArg.getJarPath()); data.put("className", livyArg.getClassName()); data.put("name", "testLivy" + System.currentTimeMillis()); data.put("executorCores", livyArg.getExecutorCores()); data.put("executorMemory", livyArg.getExecutorMemory()); data.put("driverCores", livyArg.getDriverCores()); data.put("driverMemory", livyArg.getDriverMemory()); data.put("numExecutors", livyArg.getNumExecutors()); data.put("conf", livyArg.getConf()); data.put("args", livyArg.getArgs()); ApiResult apiResult = HttpUtil.postJson(String.format("%s/batches", livyArg.getLivyServer()), data); JSONObject obj = (JSONObject) apiResult.getResult(); if (!apiResult.isSuccess() || StringUtils.isEmpty(obj.getString("state"))) { LogUtil.error("make livy request error:{}", JSON.toJSONString(apiResult)); return ApiResult.failure("提交livy任务失败"); } LogUtil.info("livy submit result:{}", JSON.toJSONString(apiResult, true)); return ApiResult.successWithResult(obj.getString("id")); } private String getLivyUrl() { return String.format("%s/ui/batch/%s/log", livyArg.getLivyServer(), livyArg.getTaskId()); } private List<String> makeListLogs(JSONArray logs) { List<String> mlogs = new ArrayList<String>(); if (logs == null) { return mlogs; } for (int i = 0; i < logs.size(); i++) { mlogs.add(logs.getString(i)); } return mlogs; } private List<String> getLivyServerLogs() { String url = String.format("%s/batches/%s/log?size=-1", livyArg.getLivyServer(), livyArg.getTaskId()); ApiResult apiResult = HttpUtil.get(url); if (apiResult.isSuccess()) { JSONObject r = (JSONObject) apiResult.getResult(); return makeListLogs(r.getJSONArray("log")); } return new ArrayList<String>(); } public LivyArg getLivyArg() { return livyArg; } public void setLivyArg(LivyArg livyArg) { this.livyArg = livyArg; } public static void main(String[] args) { // JavaWordCount(); SparkPi(); } public static void SparkPi() { LivyClient livyClient = new LivyClient(); LivyArg livyArg = new LivyArg(); livyArg.setLivyServer("http://ailoan-vip-d-012170.hz.td:8998"); livyArg.setJarPath("hdfs://ailoan-vip-d-012170.hz.td:9000/example/spark-examples_2.11-2.4.6.jar"); livyArg.setClassName("org.apache.spark.examples.SparkPi"); livyArg.setExecutorCores(1); livyArg.setDriverCores(1); livyArg.setExecutorMemory("512M"); livyArg.setDriverMemory("512M"); livyClient.setLivyArg(livyArg); ApiResult apiResult = livyClient.submitSparkJar(); System.out.println(apiResult.getCode() + "-" + apiResult.getReason()); System.out.println(apiResult); } public static void JavaWordCount() { LivyClient livyClient = new LivyClient(); LivyArg livyArg = new LivyArg(); livyArg.setLivyServer("http://ailoan-vip-d-012170.hz.td/:8998"); livyArg.setJarPath("hdfs://ailoan-vip-d-012170.hz.td:9000/example/spark-examples_2.11-2.4.6.jar"); livyArg.setClassName("org.apache.spark.examples.JavaWordCount"); livyArg.setExecutorCores(1); livyArg.setDriverCores(1); livyArg.setExecutorMemory("512M"); livyArg.setDriverMemory("512M"); List<String> args = new ArrayList<>(); args.add("hdfs://ailoan-vip-d-012170.hz.td:9000/example/wordCount.txt"); livyArg.setArgs(args); livyClient.setLivyArg(livyArg); ApiResult apiResult = livyClient.submitSparkJar(); System.out.println(apiResult.getCode() + "-" + apiResult.getReason()); System.out.println(apiResult); } }- 同样查看sparkWebUI查看提交记录及结果