how to run JavaWordCount in Spark

250 阅读1分钟

Created by Jerry Wang, last modified on Aug 17, 2015

The general steps could be found in this link: stackoverflow.com/questions/2…

  1. mkdir example-java-build/; cd example-java-build
  2. mvn archetype:generate
    -DarchetypeGroupId=org.apache.maven.archetypes
    -DgroupId=spark.examples
    -DartifactId=JavaWordCount \ – 对应生成的project folder name
    -Dfilter=org.apache.maven.archetypes:maven-archetype-quickstart
    clipboard1

below is my pom.xml:

 <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
     <modelVersion>4.0.0</modelVersion>
     <groupId>spark.examples</groupId> --- 和命令行里指定的groupid 一致
     <artifactId>JavaWordCount</artifactId>--- 和命令行里指定的groupid 一致
     <packaging>jar</packaging>
     <version>1</version>
     <name>JavaWordCount</name>
     <url>http://maven.apache.org</url>
    <dependencies>
      <dependency>
        <groupId>junit</groupId>
        <artifactId>junit</artifactId>
        <version>3.8.1</version>
        <scope>test</scope>
    </dependency>
    <dependency>
                <groupId>org.apache.spark</groupId>
                        <artifactId>spark-examples_2.10</artifactId>
                                <version>1.1.0</version>
                            </dependency>
    <dependency>
                <groupId>org.apache.spark</groupId>
                        <artifactId>spark-core_2.10</artifactId>
                                <version>1.4.1</version>
                            </dependency>
    </dependencies>
  </project>
```xml
3. cd example-java-build/JavaWordCount
mvn package
This creates your fat jar file inside the target directory. 
![clipboard2](https://p3-juejin.byteimg.com/tos-cn-i-k3u1fbpfcp/946d2aef5fc645918258e6d25766783a~tplv-k3u1fbpfcp-zoom-1.image)

在classes folder里有零散的.class file:
![clipboard3](https://p3-juejin.byteimg.com/tos-cn-i-k3u1fbpfcp/7fd269e31d724af9a0682616bfc8e0aa~tplv-k3u1fbpfcp-zoom-1.image)

Copy the jar file to any location on the server. Go to the your bin folder of your spark. 
  
Submit spark job: ./spark-submit --class "org.apache.spark.examples.JavaWordCount" --master local /root/devExpert/spark-1.4.1/example-java- build/JavaWordCount/target/JavaWordCount-1.jar
 
use jd.exe to open the compiled java class, make sure the value specified by --class equals to the complate name of class,
 
in my example it is org.apache.spark.examples.JavaWordCount. Or else you will meet with java.lang.ClassNotFoundException.
![clipboard4](https://p3-juejin.byteimg.com/tos-cn-i-k3u1fbpfcp/73014dc1186d432caf49e503cf4c9202~tplv-k3u1fbpfcp-zoom-1.image)

4. ./spark-submit --class "org.apache.spark.examples.JavaWordCount" --master local /root/devExpert/spark-1.4.1/example-java-build/JavaWordCount/target/JavaWordCount-1.jar /root/devExpert/spark-1.4.1/bin/test.txt
-debug: sh -x ./spark-submit --class "org.apache.spark.examples.JavaWordCount" --master local /root/devExpert/spark-1.4.1/example-java-build/JavaWordCount/target/JavaWordCount-1.jar /root/devExpert/spark-1.4.1/bin/test.txt
等价于:/usr/jdk1.7.0_79/bin/java -cp /root/devExpert/spark-1.4.1/conf/:/root/devExpert/spark-1.4.1/assembly/target/scala-2.10/spark-assembly-1.4.1-hadoop2.4.0.jar:/root/devExpert/spark-1.4.1/lib_managed/jars/datanucleus-rdbms-3.2.9.jar:/root/devExpert/spark-1.4.1/lib_managed/jars/datanucleus-core-3.2.10.jar:/root/devExpert/spark-1.4.1/lib_managed/jars/datanucleus-api-jdo-3.2.6.jar -Xms512m -Xmx512m -XX:MaxPermSize=256m org.apache.spark.deploy.SparkSubmit --master local --class org.apache.spark.examples.JavaWordCount /root/devExpert/spark-1.4.1/example-java-build/JavaWordCount/target/JavaWordCount-1.jar /root/devExpert/spark-1.4.1/bin/test.txt

-cp 和 -classpath 一样,是指定类运行所依赖其他类的路径,通常是类库,jar包之类,需要全路径到jar包,window上分号“;”  
  
分隔,linux上是分号“:”分隔。不支持通配符,需要列出所有jar包,用一点“.”代表当前路径。 
output:
![clipboard6](https://p3-juejin.byteimg.com/tos-cn-i-k3u1fbpfcp/b4b8c6b2225844c4b63073a53f329a01~tplv-k3u1fbpfcp-zoom-1.image)