1.建立与Spark框架的连接
val sparConf = new SparkConf().setMaster("local").setAppName("WordCount")
val sc = new SparkContext(sparConf)
2.按行读取文件目录
val lines:RDD[String] = sc.textFile("data")
3.扁平化操作,将一行数据拆分形成一个一个的单词
val words:RDD[String] = lines.flatMap(_.split(" "))
4.单词分组
val wordToCount = wordGroup.map {
case (word, list) => {
(word, list.size)
}
}
提示:以下是本篇文章正文内容,下面案例可供参考
一、聚合
第一种方法使用.size没有体现聚合的过程 本方法则是聚合的过程
二、代码实现
第二种方式实现wordCount代码
代码如下(示例):
package com.test.bigdata.spark.core.wc
import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkConf, SparkContext}
object Spark02_WordCount {
def main(args: Array[String]): Unit = {
//TODO 建立与Spark框架的连接
//JDBC :Connection
val sparConf = new SparkConf().setMaster("local").setAppName("WordCount")
val sc = new SparkContext(sparConf)
//TODO 执行业务操作
val lines: RDD[String] = sc.textFile("data")
val words: RDD[String] = lines.flatMap(_.split(" "))
val wordToOne = words.map(
word => (word, 1)
)
val wordGroup: RDD[(String, Iterable[(String, Int)])] = wordToOne.groupBy(t => t._1)
val wordToCount = wordGroup.map {
case (word, list) => {
list.reduce(
(t1, t2) => {
(t1._1, t1._2 + t2._2)
}
)
}
}
//5.将转换结果采集到控制台打印出来
val array: Array[(String, Int)] = wordToCount.collect()
array.foreach(println)
//TODO 关闭连接
sc.stop();
}
}
转载自本人在CSDN发表的文章blog.csdn.net/yumiao168/a…