大数据分析——Apache Doris(六十)

336 阅读1分钟

携手创作,共同成长!这是我参与「掘金日新计划 · 8 月更文挑战」的第14天,点击查看活动详情

  • RDD 专有配置
KeyDefault ValueComment
doris.request.auth.user--访问Doris的用户名
doris.request.auth.password--访问Doris的密码
doris.read.field--读取Doris表的列名列表,多列之间使用逗号分隔
doris.filter.query--过滤读取数据的表达式,此表达式透传给Doris。Doris使用此表达式完成源端数据过滤。
 Doris 和 Spark 列类型映射关系
Doris TypeSpark Type
NULL_TYPEDataTypes.NullType
BOOLEANDataTypes.BooleanType
TINYINTDataTypes.ByteType
SMALLINTDataTypes.ShortType
INTDataTypes.IntegerType
BIGINTDataTypes.LongType
FLOATDataTypes.FloatType
DOUBLEDataTypes.DoubleType
DATEDataTypes.StringType1
DATETIMEDataTypes.StringType1
BINARYDataTypes.BinaryType
DECIMALDecimalType
CHARDataTypes.StringType
LARGEINTDataTypes.StringType
VARCHARDataTypes.StringType
DECIMALV2DecimalType
TIMEDataTypes.DoubleType
HLLUnsupported datatype

 

注:Connector中,将DATE和DATETIME映射为String。由于Doris底层存储引擎处理逻辑,直接使用时间类型时,覆盖的时间范围无法满足需求。所以使用 String 类型直接返回对应的时间可读文本。
 演示
  • 创建maven项目

  • 导入pom依赖

<repositories>
	<repository>
		<id>mvnrepository</id>
		<url>https://mvnrepository.com/</url>
		<layout>default</layout>
	</repository>
	<repository>
		<id>cloudera</id>
		<url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
	</repository>
	<repository>
		<id>elastic.co</id>
		<url>https://artifacts.elastic.co/maven</url>
	</repository>
</repositories>

<properties>
	<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
	<!-- SDK -->
	<java.version>1.8</java.version>
	<scala.version>2.11</scala.version>
	<!-- Junit -->
	<junit.version>4.12</junit.version>
	<!-- HTTP Version -->
	<http.version>4.5.11</http.version>
	<!-- Spark -->
	<spark.version>2.4.0-cdh6.2.1</spark.version>
	<!-- Parquet -->
	<parquet.version>1.9.0-cdh6.2.1</parquet.version>
	<!-- JSON Version -->
	<fastjson.version>1.2.62</fastjson.version>
	<!-- JDBC Drivers Version-->
	<mysql.version>5.1.44</mysql.version>
	<!-- Other -->
	<jtuple.version>1.2</jtuple.version>
	<!-- Maven Plugins Version -->
	<maven-compiler-plugin.version>3.1</maven-compiler-plugin.version>
	<maven-surefire-plugin.version>2.19.1</maven-surefire-plugin.version>
	<maven-shade-plugin.version>3.2.1</maven-shade-plugin.version>
</properties>

<dependencies>
	<!-- Test -->
	<dependency>
		<groupId>junit</groupId>
		<artifactId>junit</artifactId>
		<version>${junit.version}</version>
		<scope>test</scope>
	</dependency>
	<dependency>
		<groupId>mysql</groupId>
		<artifactId>mysql-connector-java</artifactId>
		<version>${mysql.version}</version>
	</dependency>
	<!-- Http -->
	<dependency>
		<groupId>org.apache.httpcomponents</groupId>
		<artifactId>httpclient</artifactId>
		<version>${http.version}</version>
	</dependency>
	<!-- Spark -->
	<dependency>
		<groupId>org.apache.spark</groupId>
		<artifactId>spark-sql_${scala.version}</artifactId>
		<version>${spark.version}</version>
	</dependency>
	<dependency>
		<groupId>org.apache.spark</groupId>
		<artifactId>spark-sql-kafka-0-10_2.11</artifactId>
		<version>${spark.version}</version>
	</dependency>
	<dependency>
		<groupId>org.apache.parquet</groupId>
		<artifactId>parquet-common</artifactId>
		<version>${parquet.version}</version>
	</dependency>
	<dependency>
		<groupId>net.jpountz.lz4</groupId>
		<artifactId>lz4</artifactId>
		<version>1.3.0</version>
	</dependency>
</dependencies>
  • 读取doris数据
package cn.itcast.doris

import org.apache.spark.SparkConf
import org.apache.spark.sql.SparkSession
import org.apache.doris.spark._
/**
 * 需求:从doris中读取数据
 */
object TestReadDoris {
  def main(args: Array[String]): Unit = {
    val sparkConf = new SparkConf().setAppName(this.getClass.getSimpleName).setMaster("local[*]");
    val sparkSession = SparkSession.builder().config(sparkConf).getOrCreate()
    val df = sparkSession.read.format("jdbc")
      .option("url", "jdbc:mysql://node1:9030/test_db")
      .option("user", "root")
      .option("password", "123456")
      .option("dbtable", "example_site_visit")
      .load()

    df.show()
    sparkSession.stop()
  }
}