Spark编译安装(包含可运行发行版包)

1,282 阅读1分钟

源码下载地址

Apache官网:archive.apache.org/dist/spark/…

Github:github.com/apache/spar…

warning!!!请在GitBash中执行编译安装

1、Change Scala Version

./dev/change-scala-version.sh 2.12

2、注释./dev/make-distribution.sh下的配置文件,并显式配置这些变量

# VERSION=$("$MVN" help:evaluate -Dexpression=project.version $@ 2>/dev/null\
#     | grep -v "INFO"\
#     | grep -v "WARNING"\
#     | tail -n 1)
# SCALA_VERSION=$("$MVN" help:evaluate -Dexpression=scala.binary.version $@ 2>/dev/null\
#     | grep -v "INFO"\
#     | grep -v "WARNING"\
#     | tail -n 1)
# SPARK_HADOOP_VERSION=$("$MVN" help:evaluate -Dexpression=hadoop.version $@ 2>/dev/null\
#     | grep -v "INFO"\
#     | grep -v "WARNING"\
#     | tail -n 1)
# SPARK_HIVE=$("$MVN" help:evaluate -Dexpression=project.activeProfiles -pl sql/hive $@ 2>/dev/null\
#     | grep -v "INFO"\
#     | grep -v "WARNING"\
#     | fgrep --count "<id>hive</id>";\
#     # Reset exit status to 0, otherwise the script stops here if the last grep finds nothing\
#     # because we use "set -o pipefail"
#     echo -n)

3、在spark源码根目录的pom.xml<repositories>...</repositories>标签中新增Cloudera源

	<repository>
      <id>cloudera</id>
      <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
    </repository>

4、设置Man的环境变量

Linux:vim /etc/profile
export MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=1g"

Windows:
参考JDK的环境变量设置即可
MAVEN_OPTS
-Xmx2g -XX:ReservedCodeCacheSize=1g

5、Building a Runnable Distribution

./dev/make-distribution.sh --name 2.6.0-cdh5.16.2  --tgz -Phadoop-2.6  -Phive -Phive-thriftserver  -Pyarn -DskipTests -Dscala.version=2.12.10 -Dhadoop.version=2.6.0-cdh5.16.2