「Query Optimizer」Calcite Paper Reading

949 阅读35分钟

ABSTRACT

Apache Calcite is a foundational software framework that provides query processing, optimization, and query language support to many popular open-source data processing systems such as Apache Hive, Apache Storm, Apache Flink, Druid, and MapD. Calcite’s architecture consists of a modular and extensible query optimizer with hundreds of built-in optimization rules, a query processor capable of processing a variety of query languages, an adapter architecture designed for extensibility, and support for heterogeneous data models and stores (relational, semi-structured, streaming, and geospatial). This flexible, embeddable, and extensible architecture is what makes Calcite an attractive choice for adoption in big-data frameworks. It is an active project that continues to introduce support for the new types of data sources, query languages, and approaches to query processing and optimization.

  • Calcite 为流行的开源数据处理系统提供以下能力:

    • Query Processing

    • Query Optimization

    • Query Language Support

  • Calcite 本身的架构组成:

    • 模块化及可扩展的 Query Optimizer - 自带数百个优化规则

    • 可以处理多种查询语言的 Query Processor

    • Adapter 架构,支持异构的数据模型和存储(关系型、半结构化、流式、地理空间)

1 INTRODUCTION

Following the seminal System R, conventional relational database engines dominated the data processing landscape. Yet, as far back as 2005, Stonebraker and Çetintemel [49] predicted that we would see the rise a collection of specialized engines such as column stores, stream processing engines, text search engines, and so forth. They argued that specialized engines can offer more cost-effective performance and that they would bring the end of the “one size fits all” paradigm. Their vision seems today more relevant than ever. Indeed, many specialized open-source data systems have since become popular such as Storm [50] and Flink [16] (stream processing), Elasticsearch [15] (text search), Apache Spark [47], Druid [14], etc.

和过去关系型 DB 独占鳌头不同,如今越来越多的细分领域的数据处理引擎都登上了历史舞台。

As organizations have invested in data processing systems tailored towards their specific needs, two overarching problems have arisen:

  • The developers of such specialized systems have encountered related problems, such as query optimization [4, 25] or the need to support query languages such as SQL and related extensions (e.g., streaming queries [26]) as well as language-integrated queries inspired by LINQ [33]. Without a unifying framework, having multiple engineers independently develop similar optimization logic and language support wastes engineering effort.

  • Programmers using these specialized systems often have to integrate several of them together. An organization might rely on Elasticsearch, Apache Spark, and Druid. We need to build systems capable of supporting optimized queries across heterogeneous data sources [55].

随着这些专门领域的数据处理系统的发展,带来以下 2 个问题:

  • 缺少统一框架,重复造轮子。每个数据系统都会面临需要开发查询优化器、支持 SQL 等问题

  • 多个数据系统的集成问题。需要有能力对异构数据源的查询进行优化。

Apache Calcite was developed to solve these problems. It is a complete query processing system that provides much of the common functionality—query execution, optimization, and query languages required by any database management system, except for data storage and management, which are left to specialized engines. Calcite was quickly adopted by Hive, Drill [13], Storm, and many other data processing engines, providing them with advanced query optimizations and query languages.1 For example, Hive [24] is a popular data warehouse project built on top of Apache Hadoop. As Hive moved from its batch processing roots towards an interactive SQL query answering platform, it became clear that the project needed a powerful optimizer at its core. Thus, Hive adopted Calcite as its optimizer and their integration has been growing since. Many other projects and products have followed suit, including Flink, MapD [12], etc.

Apache Calcite comes to the rescue.

Calcite 是一个完整的查询处理系统:

  • 查询执行(Query Execution)

  • 查询优化(Optimization)

  • 查询语言支持(Query Languages)

Calcite 没有掺和数据存储(Data Storage),交给数据处理引擎本身。因此 Calcite 为多个开源服务提供了查询优化以及查询语言支持,比如 Hive、Flink 等。

Furthermore, Calcite enables cross-platform optimization by exposing a common interface to multiple systems. To be efficient, the optimizer needs to reason globally, e.g., make decisions across different systems about materialized view selection.

Building a common framework does not come without challenges. In particular, the framework needs to be extensible and flexible enough to accommodate the different types of systems requiring integration.

Calcite 通过暴露一组通用接口来支撑跨平台的优化能力。它的优化器会从全局来考虑,比如结合了物化视图的功能。

构建通用框架有其挑战。尤其是框架需要做到可扩展和足够地灵活来和不同类型的系统集成。

We believe that the following features have contributed to Calcite’s wide adoption in the open source community and industry:

  • Open source friendliness. Many of the major data processing platforms of the last decade have been either open source or largely based on open source. Calcite is an open-source framework, backed by the Apache Software Foundation (ASF) [5], which provides the means to collaboratively develop the project. Furthermore, the software is written in Java making it easier to interoperate with many of the latest data processing systems [12, 13, 16, 24, 28, 44] that are often written themselves in Java (or in the JVM-based Scala), especially those in the Hadoop ecosystem.

  • Multiple data models. Calcite provides support for query optimization and query languages using both streaming and conventional data processing paradigms. Calcite treats streams as time-ordered sets of records or events that are not persisted to the disk as they would be in conventional data processing systems.

  • Flexible query optimizer. Each component of the optimizer is pluggable and extensible, ranging from rules to cost models. In addition, Calcite includes support for multiple planning engines. Hence, the optimization can be broken down into phases handled by different optimization engines depending on which one is best suited for the stage.

  • Cross-system support. The Calcite framework can run and optimize queries across multiple query processing systems and database backends.

  • Reliability. Calcite is reliable, as its wide adoption over many years has led to exhaustive testing of the platform. Calcite also contains an extensive test suite validating all components of the system including query optimizer rules and integration with backend data sources.

  • Support for SQL and its extensions. Many systems do not provide their own query language, but rather prefer to rely on existing ones such as SQL. For those, Calcite provides support for ANSI standard SQL, as well as various SQL dialects and extensions, e.g., for expressing queries on streaming or nested data. In addition, Calcite includes a driver conforming to the standard Java API (JDBC).

Calcite 在开源界和工业界受到欢迎源于以下特性:

  • 拥抱开源。过去十年中大部分数据处理平台都基于开源社区,Calcite 也背靠 ASF。Calcite 以 Java 语言开发,因此和许多新近的数据处理系统更容易整合

  • 支持多种数据模型。Calcite 能为流式和传统的数据处理范式都提供查询优化和查询语言

  • 灵活的查询优化器。优化器中从 Rules 到 Cost Model,每个组件都是可插拔和可扩展的。Calcite 还支持多个查询计划引擎,因此优化过程可以被拆解为多个阶段,每个阶段交给最合适的计划引擎

  • 跨系统支持。Calcite 能够运行在多个数据处理系统上,并提供查询优化能力

  • 可靠性。过去数年被广泛采用而经历了严格测试。Calcite 本身也有全面的测试用例集

  • 支持 SQL 和其扩展。为许多本身不支持自身查询语言的系统提供比如 SQL 能力。Calcite 支持 ANSI 标准 SQL,以及其他 SQL 方言和扩展。另外,Calcite 包含一个符合 JDBC 规范的 Driver

2 RELATED WORK

Though Calcite is currently the most widely adopted optimizer for big-data analytics in the Hadoop ecosystem, many of the ideas that lie behind it are not novel. For instance, the query optimizer builds on ideas from the Volcano [20] and Cascades [19] frameworks, incorporating other widely used optimization techniques such as materialized view rewriting [10, 18, 22]. There are other systems that try to fill a similar role to Calcite.

Orca [45] is a modular query optimizer used in data management products such as Greenplum and HAWQ. Orca decouples the optimizer from the query execution engine by implementing a framework for exchanging information between the two known as Data eXchange Language. Orca also provides tools for verifying the correctness and performance of generated query plans. In contrast to Orca, Calcite can be used as a standalone query execution engine that federates multiple storage and processing backends, including pluggable planners, and optimizers.

Calcite 背后的理念并不新。比如 Volcano 以及 Cascades 框架,物化视图重写技术等。

Orca 是 Greenplum 背后的模块化查询优化器。它通过 Data eXchange Language 框架将优化器从查询执行引擎中解耦出来。

相比 Orca,Calcite 可以作为独立的查询执行引擎,提供联邦查询能力,同时它也包含了插件式的 Planner 以及 优化器。

Spark SQL [3] extends Apache Spark to support SQL query execution which can also execute queries over multiple data sources as in Calcite. However, although the Catalyst optimizer in Spark SQL also attempts to minimize query execution cost, it lacks the dynamic programming approach used by Calcite and risks falling into local minima.

Algebricks [6] is a query compiler architecture that provides a data model agnostic algebraic layer and compiler framework for big data query processing. High-level languages are compiled to Algebricks logical algebra. Algebricks then generates an optimized job targeting the Hyracks parallel processing backend. While Calcite shares a modular approach with Algebricks, Calcite also includes a support for cost-based optimizations. In the current version of Calcite, the query optimizer architecture uses dynamic programming-based planning based on Volcano [20] with extensions for multi-stage optimizations as in Orca [45]. Though in principle Algebricks could support multiple processing backends (e.g., Apache Tez, Spark), Calcite has provided well-tested support for diverse backends for many years.

Spark SQL 也可以在多个数据源上执行查询。但是它的 Catalyst 优化器缺少 Calcite 中的动态规划能力,容易陷入到局部最优解中(local minima)。

Calcite 提供了 CBO 支持。当前版本中,查询优化器架构使用基于 Volcano 模型的动态规划 + 多阶段优化(就像 Orca 那样)。

Garlic [7] is a heterogeneous data management system which represents data from multiple systems under a unified object model. However, Garlic does not support query optimization across different systems and relies on each system to optimize its own queries.

FORWARD [17] is a federated query processor that implements a superset of SQL called SQL++ [38]. SQL++ has a semi-structured data model that integrate both JSON and relational data models whereas Calcite supports semi-structured data models by representing them in the relational data model during query planning. FORWARD decomposes federated queries written in SQL++ into subqueries and executes them on the underlying databases according to the query plan. The merging of data happens inside the FORWARD engine.

Garlic 基于统一对象模型实现了异构数据管理系统。但是,Garlic 不支持这对不同系统的查询优化,依赖每个系统自身的能力实现优化。

FORWARD 是一个联邦查询处理器,它实现了 SQL 超集的支持,称为 SQL++。SQL++ 支持半结构化的数据模型,整合了 JSON 以及关系数据模型。Calcite 通过在查询计划阶段将半结构化数据表示为关系数据模型来支持半结构化数据。

Another federated data storage and processing system is BigDAWG, which abstracts a wide spectrum of data models including relational, time-series and streaming. The unit of abstraction in BigDAWG is called an island of information. Each island of information has a query language, data model and connects to one or more storage systems. Cross storage system querying is supported within the boundaries of a single island of information. Calcite instead provides a unifying relational abstraction which allows querying across backends with different data models.

Myria is a general-purpose engine for big data analytics, with advanced support for the Python language [21]. It produces query plans for other backend engines such as Spark and PostgreSQL.

BigDAWG 也是一个联邦数据存储和处理系统,它抽象了关系、时序和流式数据模型。BigDAWG 中的抽象单元称为信息岛(island of information)。每个信息岛有一个查询语言,数据模型,并且连接到一个或多个存储系统。跨存储系统的查询只在信息岛内部支持。Calcite 则提供了统一的关系抽象允许对不同的数据模型进行查询。

3 ARCHITECTURE

Calcite contains many of the pieces that comprise a typical database management system. However, it omits some key components, e.g., storage of data, algorithms to process data, and a repository for storing metadata. These omissions are deliberate: it makes Calcite an excellent choice for mediating between applications having one or more data storage locations and using multiple data processing engines. It is also a solid foundation for building bespoke data processing systems.

Calcite 包含一个典型数据库管理系统的许多部分。但是,它刻意省略了一个关键组件,比如数据存储、数据处理的算法以及元数据存储。这让 Calcite 非常适合衔接已经拥有数据存储和数据处理引擎的应用。它也能够作为定制化数据处理系统的坚实基础。

image

Figure 1 outlines the main components of Calcite’s architecture. Calcite’s optimizer uses a tree of relational operators as its internal representation. The optimization engine primarily consists of three components: rules, metadata providers, and planner engines. We discuss these components in more detail in Section 6. In the figure, the dashed lines represent possible external interactions with the framework. There are different ways to interact with Calcite.

Calcite 优化器使用树形的关系算子作为其内部表示。优化器引擎主要有 3 个组件:

  • Rules

  • Metadata Provider

  • Planner Engines

图中虚线部分表示了 Calcite 和外部交互的可能路径。

First, Calcite contains a query parser and validator that can translate a SQL query to a tree of relational operators. As Calcite does not contain a storage layer, it provides a mechanism to define table schemas and views in external storage engines via adapters (described in Section 5), so it can be used on top of these engines.

首先,Calcite 包含查询解析器和校验器,用来将 SQL 查询转换为关系算子树形表达。Calcite 通过 adapter 机制提供外部表和视图定义。

Second, although Calcite provides optimized SQL support to systems that need such database language support, it also provides optimization support to systems that already have their own language parsing and interpretation:

  • Some systems support SQL queries, but without or with limited query optimization. For example, both Hive and Spark initially offered support for the SQL language, but they did not include an optimizer. For such cases, once the query has been optimized, Calcite can translate the relational expression back to SQL. This feature allows Calcite to work as a stand-alone system on top of any data management system with a SQL interface, but no optimizer.

  • The Calcite architecture is not only tailored towards optimizing SQL queries. It is common that data processing systems choose to use their own parser for their own query language. Calcite can help optimize these queries as well. Indeed, Calcite also allows operator trees to be easily constructed by directly instantiating relational operators. One can use the built-in relational expressions builder interface. For instance, assume that we want to express the following Apache Pig [41] script using the expression builder:

其次,Calcite 不仅向需要语言支持的数据库提供了优化 SQL 的支持,还为已经拥有自身语言解析和解释能力的系统提供了单独的优化能力:

  • 一些系统支持 SQL 查询,但是查询优化的能力没有或者有限,比如 Hive 和 Spark。这种场景下,Calcite 可以作为独立的查询优化服务使用

  • Calcite 架构并非被设计为只优化 SQL 查询。Calcite 也可以用来优化其它查询语言。Calcite 可以直接实例化关系算子来构建算子树

比如,下面 Apache Pig 脚本可以直接用 Expression Builder 表达出来:

emp = LOAD 'employee_data ' AS (deptno , sal);
emp_by_dept = GROUP emp by (deptno);

emp_agg = FOREACH emp_by_dept GENERATE GROUP as deptno,
COUNT(emp.sal) AS c, SUM(emp.sal) as s;
dump emp_agg;
final RelNode node = builder
    .scan("employee_data")
    .aggregate(builder.groupKey("deptno"),
               builder.count(false, "c"),
               builder.sum(false, "s", builder.field("sal")))
    .build();

This interface exposes the main constructs necessary for building relational expressions. After the optimization phase is finished, the application can retrieve the optimized relational expression which can then be mapped back to the system’s query processing unit.

Expression Builder 暴露了用于构建关系表达式的接口。当优化阶段结束后,应用可以拿到优化的关系表达式,然后映射还原为系统自身的查询处理单元。

4 QUERY ALGEBRA

Operators. Relational algebra [11] lies at the core of Calcite. In addition to the operators that express the most common data manipulation operations, such as filter, project, join etc., Calcite includes additional operators that meet different purposes, e.g., being able to concisely represent complex operations, or recognize optimization opportunities more efficiently.

For instance, it has become common for OLAP, decision making, and streaming applications to use window definitions to express complex analytic functions such as moving average of a quantity over a time period or number or rows. Thus, Calcite introduces a window operator that encapsulates the window definition, i.e., upper and lower bound, partitioning etc., and the aggregate functions to execute on each window.

算子。Calcite 的核心是关系代数。除了表达常见数据操作的算子外,Calcite 还包含用于达到不同目的的额外算子,比如表达复杂操作,或者更高效的识别优化机会。

比如,Calcite 中引入了 window 算子,来表达 OLAP 场景、流式应用中的 window 概念。

Traits. Calcite does not use different entities to represent logical and physical operators. Instead, it describes the physical properties associated with an operator using traits. These traits help the optimizer evaluate the cost of different alternative plans. Changing a trait value does not change the logical expression being evaluated, i.e., the rows produced by the given operator will still be the same.

Calcite 不会使用不同的实体来表达逻辑算子和物理算子。它通过在算子上使用 Traits 来描述算子的物理属性。这些 Traits 用来帮助优化器评估不同计划之间的代价差异。

During optimization, Calcite tries to enforce certain traits on relational expressions, e.g., the sort order of certain columns. Relational operators can implement a converter interface that indicates how to convert traits of an expression from one value to another.

Calcite includes common traits that describe the physical properties of the data produced by a relational expression, such as ordering, grouping, and partitioning. Similar to the SCOPE optimizer [57], the Calcite optimizer can reason about these properties and exploit them to find plans that avoid unnecessary operations. For example, if the input to the sort operator is already correctly ordered—possibly because this is the same order used for rows in the backend system—then the sort operation can be removed.

在优化中,Calcite 会尝试对关系表达式添加特定 Traits,比如对特定列排序。关系表达式通过实现 converter 接口来表达 Trait 如何从一个值转换成另外一个。

Calcite 包含用于描述关系表达式产出数据的常用物理属性,比如排序、分组和分区。和 SCOPE 优化器类似,Calcite 优化器可以利用它们来找到排除了不需要操作的计划。比如,如果输入给排序算子的输入已经有序了,那么排序操作就可以被移除。

In addition to these properties, one of the main features of Calcite is the calling convention trait. Essentially, the trait represents the data processing system where the expression will be executed. Including the calling convention as a trait allows Calcite to meet its goal of optimizing transparently queries whose execution might span over different engines i.e., the convention will be treated as any other physical property.

Calcite 中一个主要特性是 Calling Convention Trait。它表达的含义是表达式将在哪个数据处理系统中执行。Calcite 通过它来实现跨引擎地优化能力。

image

For example, consider joining a Products table held in MySQL to an Orders table held in Splunk (see Figure 2). Initially, the scan of Orders takes place in the splunk convention and the scan of Products is in the jdbc-mysql convention. The tables have to be scanned inside their respective engines. The join is in the logical convention, meaning that no implementation has been chosen. Moreover, the SQL query in Figure 2 contains a filter (where clause) which is pushed into splunk by an adapter-specific rule (see Section 5). One possible implementation is to use Apache Spark as an external engine: the join is converted to spark convention, and its inputs are converters from jdbc-mysql and splunk to spark convention. But there is a more efficient implementation: exploiting the fact that Splunk can perform lookups into MySQL via ODBC, a planner rule pushes the join through the splunk-to-spark converter, and the join is now in splunk convention, running inside the Splunk engine.

相比上图一个更加高效的实现:

Splunk 可以通过 ODBC 去 MySQL 中执行 Lookup 操作,那么计划规则可以将 Join 操作下推到 splunk 引擎执行。

5 ADAPTERS

image

An adapter is an architectural pattern that defines how Calcite incorporates diverse data sources for general access. Figure 3 depicts its components. Essentially, an adapter consists of a model, a schema, and a schema factory. The model is a specification of the physical properties of the data source being accessed. A schema is the definition of the data (format and layouts) found in the model. The data itself is physically accessed via tables. Calcite interfaces with the tables defined in the adapter to read the data as the query is being executed. The adapter may define a set of rules that are added to the planner. For instance, it typically includes rules to convert various types of logical relational expressions to the corresponding relational expressions of the adapter’s convention. The schema factory component acquires the metadata information from the model and generates a schema.

Adapter 是一种架构模式,定义了 Calcite 是如何引入不同的数据源的。Adapter 由 3 个部分组成:

  • Model - 定义了数据源的物理属性

  • Schema - 定义了 Model 中的数据(格式和布局)。数据本身是通过 Table 来物理访问,Calcite 提供了访问 Table 的接口

  • Schema Factory - 通过 Model 中定义的元数据信息产出一个 Schema

As discussed in Section 4, Calcite uses a physical trait known as the calling convention to identify relational operators which correspond to a specific database backend. These physical operators implement the access paths for the underlying tables in each adapter. When a query is parsed and converted to a relational algebra expression, an operator is created for each table representing a scan of the data on that table. It is the minimal interface that an adapter must implement. If an adapter implements the table scan operator, the Calcite optimizer is then able to use client-side operators such as sorting, filtering, and joins to execute arbitrary SQL queries against these tables.

Calcite 使用 Calling Convention 来给关系算子打上它属于哪个特定的后端数据库的标。这些物理算子实现了访问底层 Tables 的方法。

当一个查询被转换为关系代数表达式后,针对每个 Table 都会有一个 TableScan 算子被创建出来。这是每个 Adapter 都需要实现的最小接口。如果 Adapter 实现了 TableScan 算子,那么 Calcite 优化器就可以在客户端通过 Sorting/Filtering/Joins 来针对这些 Tables 执行任意的 SQL 查询。

This table scan operator contains the necessary information the adapter requires to issue the scan to the adapter’s backend database. To extend the functionality provided by adapters, Calcite defines an enumerable calling convention. Relational operators with the enumerable calling convention simply operate over tuples via an iterator interface. This calling convention allows Calcite to implement operators which may not be available in each adapter’s backend. For example, the EnumerableJoin operator implements joins by collecting rows from its child nodes and joining on the desired attributes.

TableScan 算子包含了 Adapter 需要的必要信息,用来扫描背后数据库的数据。Calcite 定义了一个 Enumerable Calling Convention 来加强 Adapter 的功能。关系算子可以用它通过迭代器接口遍历 Tuples,基于它再来实现复杂一点的功能,比如 EnumerableJoin。

For queries which only touch a small subset of the data in a table, it is inefficient for Calcite to enumerate all tuples. Fortunately, the same rule-based optimizer can be used to implement adapter-specific rules for optimization. For example, suppose a query involves filtering and sorting on a table. An adapter which can perform filtering on the backend can implement a rule which matches a LogicalFilter and converts it to the adapter’s calling convention. This rule converts the LogicalFilter into another Filter instance. This new Filter node has a lower associated cost that allows Calcite to optimize queries across adapters.

对于只需要 Table 中一小部分数据的查询,Calcite 遍历所有的 Tuples 效率就不高了。

可以利用 RBO 实现 Adapter 专门的规则来优化。比如,一个带有 Filter 以及 Sort 的查询,如果 Adapter 能够通过实现规则的方式,当匹配 LogicalFilter 时将它转换为 Adapter 的 Calling Convention,那么 Adapter 就可以在背后的数据源上执行 Filter,此时代价更小。(相当于将 Filter 操作下推到了靠近数据源的地方执行)

The use of adapters is a powerful abstraction that enables not only optimization of queries for a specific backend, but also across multiple backends. Calcite is able to answer queries involving tables across multiple backends by pushing down all possible logic to each backend and then performing joins and aggregations on the resulting data. Implementing an adapter can be as simple as providing a table scan operator or it can involve the design of many advanced optimizations. Any expression represented in the relational algebra can be pushed down to adapters with optimizer rules.

Adapter 是一个有力的抽象,能将优化能力带到不同的后端存储系统。Calcite 能够做到尽可能将逻辑下推到后端存储系统,然后在结果数据上执行 Join 以及 Aggregations。

实现基本的 Adapter 只需要实现 TableScan 算子即可,当然也可以更复杂从而实现更多优化。

通过优化规则,任何关系代数表达式都可以通过 Adapter 被下推。

6 QUERY PROCESSING AND OPTIMIZATION

The query optimizer is the main component in the framework. Calcite optimizes queries by repeatedly applying planner rules to a relational expression. A cost model guides the process, and the planner engine tries to generate an alternative expression that has the same semantics as the original but a lower cost.

Every component in the optimizer is extensible. Users can add relational operators, rules, cost models, and statistics.

查询优化器是框架的主要组件。

Calcite 通过不断地应用 Planner Rules 到关系表达式上执行优化。这个过程由 Cost Model 指导,Planner Engine 试图生成一个语义上等价但代价更低的关系表达式。

优化器中的每个组件都是可扩展的。用户可以添加关系算子、规则、代价模型以及统计数据。

Planner rules. Calcite includes a set of planner rules to transform expression trees. In particular, a rule matches a given pattern in the tree and executes a transformation that preserves semantics of that expression. Calcite includes several hundred optimization rules. However, it is rather common for data processing systems relying on Calcite for optimization to include their own rules to allow specific rewritings.

Calcite 内置了一组 Planner Rules 用来转换树形表达式。

依赖 Calcite 的数据处理系统也会包含它们特有的 Rules 来允许特定转换从而实现优化。

For example, Calcite provides an adapter for Apache Cassandra [29], a wide column store which partitions data by a subset of columns in a table and then within each partition, sorts rows based on another subset of columns. As discussed in Section 5, it is beneficial for adapters to push down as much query processing as possible to each backend for efficiency. A rule to push a Sort into Cassandra must check two conditions:

(1) the table has been previously filtered to a single partition (since rows are only sorted within a partition) and

(2) the sorting of partitions in Cassandra has some common prefix with the required sort.

This requires that a LogicalFilter has been rewritten to a CassandraFilter to ensure the partition filter is pushed down to the database. The effect of the rule is simple (convert a LogicalSort into a CassandraSort) but the flexibility in rule matching enables backends to push down operators even in complex scenarios.

一个利用 Calcite 的 Rule 完成 LogicalFilter 下推到 Cassandra 的例子。

For an example of a rule with more complex effects, consider the following query:

SELECT products.name, COUNT(*)

FROM sales JOIN products USING (productId) WHERE sales.discount IS NOT NULL

GROUP BY products.name

ORDER BY COUNT(*) DESC;

image

The query corresponds to the relational algebra expression presented in Figure 4a. Because the WHERE clause only applies to the sales table, we can move the filter before the join as in Figure 4b. This optimization can significantly reduce query execution time since we do not need to perform the join for rows which do match the predicate. Furthermore, if the sales and products tables were contained in separate backends, moving the filter before the join also potentially enables an adapter to push the filter into the backend. Calcite implements this optimization via FilterIntoJoinRule which matches a filter node with a join node as a parent and checks if the filter can be performed by the join. This optimization illustrates the flexibility of the Calcite approach to optimization.

左图是查询的关系代数表达式。右图是 Filter 下推后的关系代数表达式。

Calcite 通过 FilterIntoJoinRule 规则来完成这个优化,展现了通过 Calcite 实现优化的灵活性。

Metadata** providers.** Metadata is an important part of Calcite’s optimizer, and it serves two main purposes: (i) guiding the planner towards the goal of reducing the cost of the overall query plan, and (ii) providing information to the rules while they are being applied.

Metadata providers are responsible for supplying that information to the optimizer. In particular, the default metadata providers implementation in Calcite contains functions that return the overall cost of executing a subexpression in the operator tree, the number of rows and the data size of the results of that expression, and the maximum degree of parallelism with which it can be executed. In turn, it can also provide information about the plan structure, e.g., filter conditions that are present below a certain tree node.

元数据的作用:

  1. 指导 Planner 减少查询计划整体代价

  2. 给 Rules 的应用提供信息

元数据 Provider 负责给优化器提供信息。Calcite 的默认实现包含:

  • 算子树中子表达式的代价计算

  • 行数、表达式的结果数据大小

  • 最大并行度

Calcite provides interfaces that allow data processing systems to plug their metadata information into the framework. These systems may choose to write providers that override the existing functions, or provide their own new metadata functions that might be used during the optimization phase. However, for many of them, it is sufficient to provide statistics about their input data, e.g., number of rows and size of a table, whether values for a given column are unique etc., and Calcite will do the rest of the work by using its default implementation.

Calcite 提供接口让数据处理系统可以将其元数据信息嵌入 Calcite 框架。

接入的系统可以选择覆写 Calcite 的默认实现,或者提供优化阶段使用到的新 Metadata Function。

但是,大多数情况下,提供输入数据的统计信息就足够了,比如行数、表大小,指定列值是否唯一。

As the metadata providers are pluggable, they are compiled and instantiated at runtime using Janino [27], a Java lightweight compiler. Their implementation includes a cache for metadata results, which yields significant performance improvements, e.g., when we need to compute multiple types of metadata such as cardinality, average row size, and selectivity for a given join, and all these computations rely on the cardinality of their inputs.

因为元数据 Provider 是可插拔的,它们会在运行时通过 Janino 完成编译和初始化。

实现上包含了元数据结果的缓存,能够大幅提升性能。

Planner engines. The main goal of a planner engine is to trigger the rules provided to the engine until it reaches a given objective. At the moment, Calcite provides two different engines. New engines are pluggable in the framework.

The first one, a cost-based planner engine, triggers the input rules with the goal of reducing the overall expression cost. The engine uses a dynamic programming algorithm, similar to Volcano [20], to create and track different alternative plans created by firing the rules given to the engine. Initially, each expression is registered with the planner, together with a digest based on the expression attributes and its inputs. When a rule is fired on an expression e1 and the rule produces a new expression e2, the planner will add e2 to the set of equivalence expressions Sa that e1 belongs to. In addition, the planner generates a digest for the new expression, which is compared with those previously registered in the planner. If a similar digest associated with an expression e3 that belongs to a set Sb is found, the planner has found a duplicate and hence will merge Sa and Sb into a new set of equivalences. The process continues until the planner reaches a configurable fix point. In particular, it can (i) exhaustively explore the search space until all rules have been applied on all expressions, or (ii) use a heuristic-based approach to stop the search when the plan cost has not improved by more than a given threshold δ in the last planner iterations. The cost function that allows the optimizer to decide which plan to choose is supplied through metadata providers. The default cost function implementation combines estimations for CPU, IO, and memory resources used by a given expression.

Planner Engine 的主要目标是触发 Rules 直到达到特定目标。此时,Calcite 提供了两种不同的 Engines,新的 Engine 可以作为插件提供。

第一个是基于代价的 Planner Engine,它触发 Rules 的目的是减少表达式的整体代价。Engine 使用动态规划算法,类似 Volcano。过程描述:

  • 初始阶段,每个表达式,会带有它的属性和输入的摘要信息,被注册到 Planner。

  • 当对表达式 e1 应用一个规则变换得到表达式 e2 之后,Planner 会将 e2 添加到 e1 所在的等价表达式集合 Sa 中。

  • Planner 还会针对 表达式 e2 生成一个摘要,并将摘要和 Planner 已注册的进行比较。如果找到一个属于集合 Sb 的表达式 e3,它的摘要信息和 e2 类似,那么 Planner 会合并集合 Sa 和 Sb。

  • 以上步骤会持续进行直到一个可配置的固定点。

固定点有两种选项:

  1. 彻底探索搜索空间,直到所有的 Rules 都被应用到所有表达式上

  2. 使用启发式的方法来停止搜索:经过几轮迭代后,计划代价提升的幅度没有达到预期阈值 δ

优化器用于决定使用哪个计划的 Cost Function 是由元数据 Provider 提供的。Cost Function 默认实现结合了一个表达式对 CPU、IO 以及 Memory 资源的使用预估。

The second engine is an exhaustive planner, which triggers rules exhaustively until it generates an expression that is no longer modified by any rules. This planner is useful to quickly execute rules without taking into account the cost of each expression.

Users may choose to use one of the existing planner engines depending on their concrete needs, and switching from one to another, when their system requirements change, is straightforward. Alternatively, users may choose to generate multi-stage optimization logic, in which different sets of rules are applied in consecutive phases of the optimization process. Importantly, the existence of two planners allows Calcite users to reduce the overall optimization time by guiding the search for different query plans.

第二种是穷举 Planner,它会不断触发 Rules 产生表达式直到不能再产生新的表达式。适合于不需要考虑每个表达式代价的场景,它的执行速度会更快一些。

用户可以根据需要来选择其中的一个。也可以实现多阶段优化逻辑,每个阶段会使用不同的 Rule 集合。

Materialized views. One of the most powerful techniques to accelerate query processing in data warehouses is the precomputation of relevant summaries or materialized views. Multiple Calcite adapters and projects relying on Calcite have their own notion of materialized views. For instance, Cassandra allows the user to define materialized views based on existing tables which are automatically maintained by the system.

These engines expose their materialized views to Calcite. The optimizer then has the opportunity to rewrite incoming queries to use these views instead of the original tables. In particular, Calcite provides an implementation of two different materialized view-based rewriting algorithms.

数仓中的汇总数据预计算或物化视图是加速查询处理的一种强有力手段。一些 Calcite Adapters 或者项目依赖了 Calcite 的物化视图能力。

接入的引擎向 Calcite 暴露了它们的物化视图。优化器就有机会将查询重写以使用物化视图代替原表。Calcite 提供了两种基于物化视图的重写算法实现。

The first approach is based on view substitution [10, 18]. The aim is to substitute part of the relational algebra tree with an equivalent expression which makes use of a materialized view, and the algorithm proceeds as follows: (i) the scan operator over the materialized view and the materialized view definition plan are registered with the planner, and (ii) transformation rules that try to unify expressions in the plan are triggered. Views do not need to exactly match expressions in the query being replaced, as the rewriting algorithm in Calcite can produce partial rewritings that include additional operators to compute the desired expression, e.g., filters with residual predicate conditions.

第一种方法基于视图替换。将关系代数树中的部分替换成使用了物化视图的等价表达式。

The second approach is based on lattices [22]. Once the data sources are declared to form a lattice, Calcite represents each of the materializations as a tile which in turn can be used by the optimizer to answer incoming queries. On the one hand, the rewriting algorithm is especially efficient in matching expressions over data sources organized in a star schema, which are common in OLAP applications. On the other hand, it is more restrictive than view substitution, as it imposes restrictions on the underlying schema.

第二种方法基于 Lattices。

7 EXTENDING CALCITE

As we have mentioned in the previous sections, Calcite is not only tailored towards SQL processing. In fact, Calcite provides extensions to SQL expressing queries over other data abstractions, such as semi-structured, streaming and geospatial data. Its internal operators adapt to these queries. In addition to extensions to SQL, Calcite also includes a language-integrated query language. We describe these extensions throughout this section and provide some examples.

Calcite 对 SQL 进行了扩展,以支持 半结构化、流式以及地理共建数据。

Calcite 也包含了集成的查询语言(LINQ4J)。

这部分内容暂时先不尝试提炼了,和 Calcite 的核心能力关系不大。

8 INDUSTRY AND ACADEMIA ADOPTION

Calcite enjoys wide adoption, specially among open-source projects used in industry. As Calcite provides certain integration flexibility, these projects have chosen to either (i) embed Calcite within their core, i.e., use it as a library, or (ii) implement an adapter to allow Calcite to federate query processing. In addition, we see a growing interest in the research community to use Calcite as the cornerstone of the development of data management projects. In the following, we describe how different systems are using Calcite.

Calcite 被广泛采用,尤其是在开源项目中,这些项目对 Calcite 的使用有以下方式:

  1. 将 Calcite 嵌入内核,即将 Calcite 作为库使用

  2. 实现 Adapter 从而让 Calcite 实现联邦查询处理。另外,研究社区开始使用 Calcite 作为开发数据管理项目的基础

8.1 Embedded Calcite

image

Table 1 provides a list of software that incorporates Calcite as a library, including (i) the query language interface that they expose to users, (ii) whether they use Calcite’s JDBC driver (called Avatica), (iii) whether they use the SQL parser and validator included in Calcite, (iv) whether they use Calcite’s query algebra to represent their operations over data, and (v) the engine that they rely on for execution, e.g., their own native engine, Calcite’s operators (referred to as enumerable), or any other project.

Table 1 中列出了使用 Calcite 作为库的软件:

  1. 暴露给用户的查询语言接口

  2. 是否使用 Calcite JDBC Driver(Avatica)

  3. 是否使用 Calcite SQL Parser 以及 Validator

  4. 是否使用 Calcite 查询代数来表达针对数据的操作

  5. 依赖的执行引擎。比如,自己实现的引擎,或者 Calcite 提供的算子(Enumerable)

Drill [13] is a flexible data processing engine based on the Dremel system [34] that internally uses a schema-free JSON document data model. Drill uses its own dialect of SQL that includes extensions to express queries on semi-structured data, similar to SQL++ [38].

Drill 是基于 Dremel 系统的一个数据处理引擎,内部使用 schema-free JSON 文档数据模型。Drill 使用自己的 SQL 方言,包含了针对半结构化数据的查询扩展,类似 SQL++。

Hive [24] first became popular as a SQL interface on top of the MapReduce programming model [52]. It has since moved towards being an interactive SQL query answering engine, adopting Calcite as its rule and cost-based optimizer. Instead of relying on Calcite’s JDBC driver, SQL parser and validator, Hive uses its own implementation of these components. The query is then translated into Calcite operators, which after optimization are translated into Hive’s physical algebra. Hive operators can be executed by multiple engines, the most popular being Apache Tez [43, 51] and Apache Spark [47, 56].

Hive 是通过为 MapReduce 编程模型提供 SQL 接口而流行的。从那以后它朝着成为交互式 SQL 响应引擎的方向发展,使用 Calcite 作为它的规则和 CBO 优化器。Hive 使用了自身的 JDBC Driver 以及 SQL Parser 以及 Validator。查询会被转换为 Calcite 算子,经过优化后在转换为 Hive 的物理算子,提交到诸如 Apache Tez 以及 Apache Spark 这样的执行引擎上。

Apache Solr [46] is a popular full-text distributed search platform built on top of the Apache Lucene library [31]. Solr exposes multiple query interfaces to users, including REST-like HTTP/XML and JSON APIs. In addition, Solr integrates with Calcite to provide SQL compatibility.

Apache Solr 是一个流行的分布式全文检索平台,构建在 Apache Lucene 库之上。Solr 为用户提供了多种查询接口。Solr 集成 Calcite 来提供 SQL 操作的兼容性。

Apache Phoenix [40] and Apache Kylin [28] both work on top of Apache HBase [23], a distributed key-value store modeled after Bigtable [9]. In particular, Phoenix provides a SQL interface and orchestration layer to query HBase. Kylin focuses on OLAP-style SQL queries instead, building cubes that are declared as materialized views and stored in HBase, and hence allowing Calcite’s optimizer to rewrite the input queries to be answered using those cubes. In Kylin, query plans are executed using a combination of Calcite native operators and HBase.

Apache Phoenix 以及 Apache Kylin 都工作在 Apache HBase 之上,HBase 是一个参考 Bigtable 的分布式 KV 存储模型。Phoenix 提供了一个 SQL 接口和编排层来查询 HBase。Kylin 聚焦在 OLAP 类型的 SQL 查询,以 Cubes 构建物化视图并存储在 HBase,因此可以让 Calcite 的优化器重写输入的查询来利用 Cubes。Kylin 中的查询计划通过结合 Calcite 原生算子和 HBase 来执行。

Recently Calcite has become popular among streaming systems too. Projects such as Apache Apex [1], Flink [16], Apache Samza [44], and Storm [50] have chosen to integrate with Calcite, using its components to provide a streaming SQL interface to their users. Finally, other commercial systems have adopted Calcite, such as MapD [32], Lingual [30], and Qubole Quark [42].

最近 Calcite 在流式系统中也流行起来。Apache Apex、Flink、Samza 以及 Storm 都集成了 Calcite,使用 Calcite 的组件向用户提供流式 SQL 接口。

一些商用系统也采用了 Calcite,比如 MapD、Lingual 以及 Qubole Quark。

8.2 Calcite Adapters

image

Instead of using Calcite as a library, other systems integrate with Calcite via adapters which read their data sources. Table 2 provides the list of available adapters in Calcite. One of the main key components of the implementation of these adapters is the converter responsible for translating the algebra expression to be pushed to the system into the query language supported by that system. Table 2 also shows the languages that Calcite translates into for each of these adapters.

除了将 Calcite 作为库使用,其它系统使用 Calcite 的 Adapters 能力来读取数据。

Adapters 主要负责将关系代数表达式推送并转换成目标系统能够支持的查询语言。比如上表中展示了 Adapter 目标系统和转换的查询语言类型。

The JDBC adapter supports the generation of multiple SQL dialects, including those supported by popular RDBMSes such as PostgreSQL and MySQL. In turn, the adapter for Cassandra [8] generates its own SQL-like language called CQL whereas the adapter for Apache Pig [41] generates queries expressed in Pig Latin [37]. The adapter for Apache Spark [47] uses the Java RDD API. Finally, Druid [14], Elasticsearch [15] and Splunk [48] are queried through REST HTTP API requests. The queries generated by Calcite for these systems are expressed in JSON or XML.

JDBC Adapter 支持多种 SQL 方言的生成,比如 PG 和 MySQL。Cassandra Adapter 则生成它自身的类 SQL 查询语言,CQL。诸如此类。

8.3 Uses in Research

In a research setting, Calcite has been considered [54] as a polystore-alternative for precision medicine and clinical analysis scenarios. In those scenarios, heterogeneous medical data has to be logically assembled and aligned to assess the best treatments based on the comprehensive medical history and the genomic profile of the patient. The data comes from relational sources representing patients’ electronic medical records, structured and semi-structured sources representing various reports (oncology, psychiatry, laboratory tests, radiology, etc.), imaging, signals, and sequence data, stored in scientific databases. In those circumstances, Calcite represents a good foundation with its uniform query interface, and flexible adapter architecture, but the ongoing research efforts are aimed at (i) introduction of the new adapters for array, and textual sources, and (ii) support efficient joining of heterogeneous data sources.

Calcite 提供了统一的查询接口,灵活的 Adapter 架构。

后续的研究目标为:

  • 引入针对数组、文本数据源的 Adapters

  • 支持异构数据源高效 Join

9 FUTURE WORK

The future work on Calcite will focus on the development of the new features, and the expansion of its adapter architecture:

  • Enhancements to the design of Calcite to further support its use a standalone engine, which would require a support for data definition languages (DDL), materialized views, indexes and constraints.

  • Ongoing improvements to the design and flexibility of the planner, including making it more modular, allowing users Calcite to supply planner programs (collections of rules organized into planning phases) for execution.

  • Incorporation of new parametric approaches [53] into the design of the optimizer.

  • Support for an extended set of SQL commands, functions, and utilities, including full compliance with OpenGIS.

  • New adapters for non-relational data sources such as array databases for scientific computing.

  • Improvements to performance profiling and instrumentation.

未来 Calcite 会继续开发新的特性,并扩展 Adapter 架构:

  • 继续加强 Calcite 作为独立引擎的能力,支持 DDL,物化视图,索引以及约束条件

  • Planner 的设计和灵活性的提升,更加模块化,允许用户提供 Planner 执行程序(Rules 的集合并组成到 Planning 阶段中)

  • 优化器的设计考虑新的参数化方法

  • 扩展 SQL 命令、函数、工具,和 OpenGIS 完全兼容

  • 针对非关系型数据源的新 Adapters,比如科学计算领域的 Array 数据库

  • 性能分析以及工具的提升

10 CONCLUSION

Emerging data management practices and associated analytic uses of data continue to evolve towards an increasingly diverse, and heterogeneous spectrum of scenarios. At the same time, relational data sources, accessed through the SQL, remain an essential means to how enterprises work with the data. In this somewhat dichotomous space, Calcite plays a unique role with its strong support for both traditional, conventional data processing, and for its support of other data sources including those with semi-structured, streaming and geospatial models. In addition, Calcite’s design philosophy with a focus on flexibility, adaptivity, and extensibility, has been another factor in Calcite becoming the most widely adopted query optimizer, used in a large number of open-source frameworks. Calcite’s dynamic and flexible query optimizer, and its adapter architecture allows it to be embedded selectively by a variety of data management frameworks such as Hive, Drill, MapD, and Flink. Calcite’s support for heterogeneous data processing, as well as for the extended set of relational functions will continue to improve, in both functionality and performance.

不同的数据管理实践方式,对数据的分析用途持续地多样化。同时,关系型数据源通过 SQL 的访问方式,仍然是企业使用数据的必要手段。Calcite 凭借它对各种类型数据源的支持在其间扮演了独特的角色。Calcite 仍会在功能和性能两方面持续地提升其异构数据处理的能力。

REF