持续创作,加速成长!这是我参与「掘金日新计划 · 6 月更文挑战」的第3天,点击查看活动详情
spark—sql
要求:1、编写自定义sql命令;2、实现sql优化效果;3、自定义规则
修改SqlBase.g4
statement
| SHOW VERSION #showVersion
ansiNonReserved
| VERSION
nonReserved
| VERSION
//--SPARK-KEYWORD-LIST-START
VERSION: 'VERSION' | 'V';
修改SparkSqlParser.scala
override def visitShowVersion(ctx: ShowVersionContext): LogicalPlan = withOrigin(ctx) {
ShowVersionCommand()
}
ShowVersionCommand类
package org.apache.spark.sql.execution.command
import org.apache.spark.sql.{Row, SparkSession}
import org.apache.spark.sql.catalyst.expressions.{Attribute, AttributeReference}
import org.apache.spark.sql.types.StringType
case class ShowVersionCommand() extends LeafRunnableCommand {
override val output: Seq[Attribute] =
Seq(AttributeReference("version", StringType)())
override def run(sparkSession: SparkSession): Seq[Row] = {
val sparkVersion = sparkSession.version
val javaVersion = System.getProperty("java.version")
val scalaVersion = scala.util.Properties.releaseVersion
val output = "Spark Version: %s, Java Version: %s, Scala Version: %s"
.format(sparkVersion, javaVersion, scalaVersion.getOrElse(""))
Seq(Row(output))
}
}
编译
build/mvn clean package -DskipTests -Phive -Phive-thriftserver
22/06/23 17:20:15 WARN ObjectStore: setMetaStoreSchemaVersion called but recording version is disabled: version = 2.3.0, comment = Set by MetaStore xuejiameng@192.168.54.71
Spark master: local[*], Application Id: local-1655976010880
spark-sql> show version;
第二题: 首先简单写个测试类测试一下 java 如何使用 spark-sql 去连接 mysql
public class SparkSQLTest3 {
public static void main(String[] args){
//java版本
SparkConf conf = new SparkConf();
conf.setMaster("local"); //本地单线程运行
conf.setAppName("testJob");
JavaSparkContext sc = new JavaSparkContext(conf);
SparkSession spark = SparkSession
.builder()
.appName("SparkSQLTest3")
.config("spark.some.config.option", "some-value")
.getOrCreate();
Dataset<Row> jdbcDF = spark.read()
.format("jdbc")
.option("url", "jdbc:mysql://127.0.0.1:3306/test")
.option("dbtable", "(SELECT * FROM city) city")
.option("user", "root")
.option("password", "root")
.load();
jdbcDF.printSchema();
jdbcDF.show();
spark.stop();
}
}
加个配置就可以打印规则
spark.conf().set("spark.sql.planChangeLog.level","WARN");
CombineFilters:合并两个相邻的Filters,谓词下推的一种形式
CollapseProject: 合并两个确定且独立的列
BooleanSimplification:布尔表达式简化
简单写一个例子:
jdbcDF.registerTempTable("city");
spark.sql("select NAME,CODE from (select NAME,CODE from city where 1 = 1 and CODE>=101010100) tmp where tmp.CODE<=20000000 ").explain(true);
这样三个规则都满足了
然后加一个表写个联查:
jdbcDF.registerTempTable("city2");
spark.sql("SELECT DISTINCT a.NAME FROM (" +
"SELECT*FROM city WHERE CODE> 101010100 AND 1=1) a WHERE a.id=2 except " +
"SELECT NAME FROM city2 WHERE CODE> 101280101 ").explain(true);
剩下的规则也都满足了