spark

spark

spark

spark学习

等 1 人订阅共21篇文章创建于2022-05-18

hudi基本概念

TimeLIne Apache HUDI 作为数据湖框架的一种开源实现，提供了事务、高效的更新和删除、高级索引、流式集成、小文件合并、log文件合并优化和并发支持等多种能力，支持实时消费增量数据、离

2年前
190
点赞
评论

Spark SQL 查询引擎 –AQE(Part 2)

In the previous blog post, we looked into how the Adaptive Query Execution (AQE) framework is implem

3年前
251
点赞
评论

Spark SQL 查询引擎–AQE (Part 1)

Cost-based optimisation (CBO) is not a new thing. It has been widely used in the RDBMS world for man

3年前
318
点赞
评论

Spark SQL 查询引擎 -Partitioning & Bucketing

I was planning to write about the Adaptive Query Execution (AQE) in this and next few blog posts, an

3年前
148
点赞
评论

Spark SQL 查询引擎– Dynamic Partition Pruning

In this blog post, I will explain the Dynamic Partition Pruning (DPP), which is a performance optimi

3年前
455
点赞
评论

Spark SQL 查询引擎 – ShuffleExchangeExec & UnsafeShuffleWrite

This blog post continues to discuss the partitioning and ordering in Spark. In the last blog post, I

3年前
324
点赞
评论

Spark SQL 查询引擎– UnsafeExternalSorter & SortExec

In the last blog post, I explained the partitioning and ordering requirements for preparing a physic

3年前
486
1
评论

Spark SQL 查询引擎 – Partitioning & Ordering

In the last few blog posts, I introduced the SparkPlanner for generating physical plans from logical

3年前
156
点赞
评论

Spark SQL 查询引擎 – Cache Commands Internal

This blog post looks into Spark SQL Cache Commands under the hood, walking through the execution flo

3年前
165
点赞
评论

Spark SQL 查询引擎– SessionCatalog & RunnableCommand Interna

In this blog posts, I will dig into the execution internals of the runnable commands, which inherit

3年前
337
点赞
评论

Spark SQL 查询引擎 – Join Strategies

In this blog post, I am going to explain the Join strategies applied by the Spark Planner for genera

3年前
143
点赞
评论

Spark SQL 查询引擎– HashAggregateExec & ObjectHashAggregateExec

This blog post continues to explore the Aggregate strategy and focuses on the two hash-based aggrega

3年前
240
点赞
评论

Spark SQL 查询引擎– SortAggregateExec

The last blog post explains the Aggregation strategy for generating physical plans for aggregate ope

3年前
316
点赞
评论

Spark SQL 查询引擎 – Aggregation Strategy

In the last blog post, I gave an overview of the SparkPlanner for planning physical execution plans

3年前
168
点赞
评论

Spark SQL 查询引擎 – Spark Planner

After logical plans are optimised by the Catalyst Optimizer rules, SparkPlanner takes an optimized l

3年前
340
点赞
评论

Spark SQL 查询引擎– Catalyst Optimizer Rules (Part 3)

After two lengthy blog posts on Catalyst Optimizer rules, this blog post will close this topic and c

3年前
301
点赞
评论

Spark SQL 查询引擎 – Catalyst Optimizer Rules (Part 2)

In the previous blog post, I covered the rules included in the “Eliminate Distinct“, “Finish Analysi

3年前
189
点赞
评论