1. New Project - 创建maven工程
-
Add Archetype
-
jdk与scala版本匹配: docs.scala-lang.org/overviews/j…
试了两天错发现截至2023.4.14 jdk20 和jdk19都不适配,1.8就是最好用的!!!
-
我的点开是没有maven-archetype-scala 的
- 方法1 :add
方法2: 随便新建webapp也可以,自己后续创建scala文件夹
- 方法1 :add
方法2: 随便新建webapp也可以,自己后续创建scala文件夹
-
2.添加scala
- 如果只在local跑,版本都可以
- 如果要跟别的连接,版本要注意统一
3. 新建scala文件
1. 右键项目add frameworks
添加了这个才能新建scala项目
效果:
4. 设置
Scala文件夹要选择Source类型,不然在运行Scala时候会错误: 找不到或无法加载主类。
5. 配置settings.xml,更改阿里镜像
<settings xmlns="maven.apache.org/SETTINGS/1.…"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0 http://maven.apache.org/xsd/settings-1.0.0.xsd">
<mirrors>
<!-- mirror
| Specifies a repository mirror site to use instead of a given repository. The repository that
| this mirror serves has an ID that matches the mirrorOf element of this mirror. IDs are used
| for inheritance and direct lookup purposes, and must be unique across the set of mirrors.
|
<mirror>
<id>mirrorId</id>
<mirrorOf>repositoryId</mirrorOf>
<name>Human Readable Name for this Mirror.</name>
<url>http://my.repository.com/repo/path</url>
</mirror>
-->
<mirror>
<id>alimaven</id>
<mirrorOf>central</mirrorOf>
<name>aliyun maven</name>
<url>http://maven.aliyun.com/nexus/content/repositories/central/</url>
</mirror>
<mirror>
<id>alimaven</id>
<name>aliyun maven</name>
<url>http://maven.aliyun.com/nexus/content/groups/public/</url>
<mirrorOf>central</mirrorOf>
</mirror>
<mirror>
<id>central</id>
<name>Maven Repository Switchboard</name>
<url>http://repo1.maven.org/maven2/</url>
<mirrorOf>central</mirrorOf>
</mirror>
<mirror>
<id>repo2</id>
<mirrorOf>central</mirrorOf>
<name>Human Readable Name for this Mirror.</name>
<url>http://repo2.maven.org/maven2/</url>
</mirror>
<mirror>
<id>ibiblio</id>
<mirrorOf>central</mirrorOf>
<name>Human Readable Name for this Mirror.</name>
<url>http://mirrors.ibiblio.org/pub/mirrors/maven2/</url>
</mirror>
<mirror>
<id>jboss-public-repository-group</id>
<mirrorOf>central</mirrorOf>
<name>JBoss Public Repository Group</name>
<url>http://repository.jboss.org/nexus/content/groups/public</url>
</mirror>
<mirror>
<id>google-maven-central</id>
<name>Google Maven Central</name>
<url>https://maven-central.storage.googleapis.com
</url>
<mirrorOf>central</mirrorOf>
</mirror>
<!-- 中央仓库在中国的镜像 -->
<mirror>
<id>maven.net.cn</id>
<name>oneof the central mirrors in china</name>
<url>http://maven.net.cn/content/groups/public/</url>
<mirrorOf>central</mirrorOf>
</mirror>
</mirrors>
```
6.配置pom.xml文件
-
注意版本号都要一一对应,ctrl+s后maven会自动下载
-
properties设置
<properties> <scala.binary.version>2.12</scala.binary.version> <scala.version>2.12.15</scala.version> <spark.version>3.3.2</spark.version> <hadoop.version>2.6.0</hadoop.version> </properties> -
dependencies配置
<dependencies> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>3.8.1</version> <scope>test</scope> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_${scala.binary.version}</artifactId> <version>${spark.version}</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-sql_${scala.binary.version}</artifactId> <version>${spark.version}</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-streaming_${scala.binary.version}</artifactId> <version>${spark.version}</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-hive_${scala.binary.version}</artifactId> <version>${spark.version}</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-streaming-kafka-0-10_${scala.binary.version}</artifactId> <version>${spark.version}</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-sql-kafka-0-10_${scala.binary.version}</artifactId> <version>${spark.version}</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>${hadoop.version}</version> </dependency> </dependencies>
-
7. 开发Spark Application程序并进行本地测试
-
WordCount.scala
import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ import org.apache.spark.SparkConf object WordCount { def main(args: Array[String]) { val inputFile = "/Users/bml/Documents/spark/Data01.txt" val conf = new SparkConf().setAppName("WordCount").setMaster("local") val sc = new SparkContext(conf) val textFile = sc.textFile(inputFile) val wordCount = textFile.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey((a, b) => a + b) wordCount.foreach(println) } }
8. 配置 Spark 通过 JDBC 连接数据库 MySQL
-
使用maven库自动下载
mvnrepository.com/artifact/my…
搜索mysql 找到对应版本的maven语句
-
放到pom.xml
<dependency> <groupId>mysql</groupId> <artifactId>mysql-connector-java</artifactId> <version>8.0.32</version> </dependency> -
会自动下载
-
scala文件
import org.apache.spark.SparkConf import org.apache.spark.SparkContext import org.apache.spark.sql.SparkSession object RDDtoDF { def main(args: Array[String]): Unit = { val conf = new SparkConf() conf.setMaster("local") .setAppName("RDDtoDF") val sc = new SparkContext(conf) val spark = SparkSession.builder.getOrCreate val filePath = args(0) val rdd = spark.sparkContext.textFile(filePath) // 将 RDD 转换为 DataFrame import spark.implicits._ val df = rdd.map(_.split(",")).map(row => (row(0).toInt, row(1), row(2).toInt)) .toDF("id", "name", "age") println("111") println(df) df.show() // 打印 DataFrame 的所有数据 df.collect().foreach(row => println(s"id:${row.getAs[Int]("id")},name:${row.getAs[String]("name")},age:${row.getAs[Int]("age")}") ) // 停止 SparkSession 对象 spark.stop() } }