携手创作,共同成长!这是我参与「掘金日新计划 · 8 月更文挑战」的第14天,点击查看活动详情
- RDD 专有配置
| Key | Default Value | Comment |
|---|---|---|
| doris.request.auth.user | -- | 访问Doris的用户名 |
| doris.request.auth.password | -- | 访问Doris的密码 |
| doris.read.field | -- | 读取Doris表的列名列表,多列之间使用逗号分隔 |
| doris.filter.query | -- | 过滤读取数据的表达式,此表达式透传给Doris。Doris使用此表达式完成源端数据过滤。 |
Doris 和 Spark 列类型映射关系
| Doris Type | Spark Type |
|---|---|
| NULL_TYPE | DataTypes.NullType |
| BOOLEAN | DataTypes.BooleanType |
| TINYINT | DataTypes.ByteType |
| SMALLINT | DataTypes.ShortType |
| INT | DataTypes.IntegerType |
| BIGINT | DataTypes.LongType |
| FLOAT | DataTypes.FloatType |
| DOUBLE | DataTypes.DoubleType |
| DATE | DataTypes.StringType1 |
| DATETIME | DataTypes.StringType1 |
| BINARY | DataTypes.BinaryType |
| DECIMAL | DecimalType |
| CHAR | DataTypes.StringType |
| LARGEINT | DataTypes.StringType |
| VARCHAR | DataTypes.StringType |
| DECIMALV2 | DecimalType |
| TIME | DataTypes.DoubleType |
| HLL | Unsupported datatype |
| 注:Connector中,将DATE和DATETIME映射为String。由于Doris底层存储引擎处理逻辑,直接使用时间类型时,覆盖的时间范围无法满足需求。所以使用 String 类型直接返回对应的时间可读文本。 |
|---|
演示
-
创建maven项目
-
导入pom依赖
<repositories>
<repository>
<id>mvnrepository</id>
<url>https://mvnrepository.com/</url>
<layout>default</layout>
</repository>
<repository>
<id>cloudera</id>
<url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
</repository>
<repository>
<id>elastic.co</id>
<url>https://artifacts.elastic.co/maven</url>
</repository>
</repositories>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<!-- SDK -->
<java.version>1.8</java.version>
<scala.version>2.11</scala.version>
<!-- Junit -->
<junit.version>4.12</junit.version>
<!-- HTTP Version -->
<http.version>4.5.11</http.version>
<!-- Spark -->
<spark.version>2.4.0-cdh6.2.1</spark.version>
<!-- Parquet -->
<parquet.version>1.9.0-cdh6.2.1</parquet.version>
<!-- JSON Version -->
<fastjson.version>1.2.62</fastjson.version>
<!-- JDBC Drivers Version-->
<mysql.version>5.1.44</mysql.version>
<!-- Other -->
<jtuple.version>1.2</jtuple.version>
<!-- Maven Plugins Version -->
<maven-compiler-plugin.version>3.1</maven-compiler-plugin.version>
<maven-surefire-plugin.version>2.19.1</maven-surefire-plugin.version>
<maven-shade-plugin.version>3.2.1</maven-shade-plugin.version>
</properties>
<dependencies>
<!-- Test -->
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>${junit.version}</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>${mysql.version}</version>
</dependency>
<!-- Http -->
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
<version>${http.version}</version>
</dependency>
<!-- Spark -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_${scala.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql-kafka-0-10_2.11</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.parquet</groupId>
<artifactId>parquet-common</artifactId>
<version>${parquet.version}</version>
</dependency>
<dependency>
<groupId>net.jpountz.lz4</groupId>
<artifactId>lz4</artifactId>
<version>1.3.0</version>
</dependency>
</dependencies>
- 读取doris数据
package cn.itcast.doris
import org.apache.spark.SparkConf
import org.apache.spark.sql.SparkSession
import org.apache.doris.spark._
/**
* 需求:从doris中读取数据
*/
object TestReadDoris {
def main(args: Array[String]): Unit = {
val sparkConf = new SparkConf().setAppName(this.getClass.getSimpleName).setMaster("local[*]");
val sparkSession = SparkSession.builder().config(sparkConf).getOrCreate()
val df = sparkSession.read.format("jdbc")
.option("url", "jdbc:mysql://node1:9030/test_db")
.option("user", "root")
.option("password", "123456")
.option("dbtable", "example_site_visit")
.load()
df.show()
sparkSession.stop()
}
}