在做大数据的离线开发时,在join后的on关联条件时,2个字段类型不一致,但是自己发现不了,当出现数据倾斜时,仔细排查会发现2个字段的数据类型不一致。此篇文章则是对此类问题作出解释!
同Java语言一样,Hive也包括 隐式转换(implicit conversions)和显式转换(explicitly conversions)。
Hive在需要的时候将会对numeric类型的数据进行隐式转换。比如我们对两个不同数据类型的数字进行比较,假如一个数据类型是INT型,另一个 是SMALLINT类型,那么SMALLINT类型的数据将会被隐式转换地转换为INT类型,这个到底和Java中的一样;但是我们不能隐式地将一个 INT类型的数据转换成SMALLINT或TINYINT类型的数据,这将会返回错误,除非你使用了CAST操作。
任何整数类型都可以隐式地转换成一个范围更大的类型。TINYINT,SMALLINT,INT,BIGINT,FLOAT和STRING都可以隐式 地转换成DOUBLE;是的你没看出,STRING也可以隐式地转换成DOUBLE!但是你要记住,BOOLEAN类型不能转换为其他任何数据类型!
下标列出了Hive内置的数据类型之间是否可以进行隐式的转换操作:(横着看)
注:由于表格比较大,这里对一些比较长的字符串进行缩写,ts是timestamp的缩写,bl是boolean的缩写,sl是smallint的缩写,dm是decimal的缩写,vc是varchar的缩写,ba是binary的缩写。
我们可以用CAST来显式的将一个类型的数据转换成另一个数据类型。
非数字的string转为double后变为NULL,如果不对NULL做处理的话,NULL值会输入到一个Reduce Task中,容易导致数据倾斜!
1 string join int -> double,2个字段都会转化成double(非数字的string值会转化为NULL)
> explain
> select t1.c1,t2.c1
> from (
> select '1' as c1
>
> union all
> select '2' as c1
>
> union all
> select '34' as c1
>
> union all
> select '3ab4' as c1
>
> union all
> select 'ab' as c1
>
> union all
> select '' as c1
>
> union all
> select ' ' as c1
>
> union all
> select null as c1
> ) t1
>
> join
>
> (
> select 1 as c1
>
> union all
> select 2 as c1
>
> ) t2
> on t1.c1 = t2.c1
>
> ;
OK
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-1
Map Reduce
Map Operator Tree:
TableScan
alias: _dummy_table
Row Limit Per Split: 1
Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: UDFToDouble('1') is not null (type: boolean)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Select Operator
expressions: '1' (type: string)
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Union
Statistics: Num rows: 8 Data size: 184 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: UDFToDouble(_col0) is not null (type: boolean)
Statistics: Num rows: 4 Data size: 736 Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: UDFToDouble(_col0) (type: double)
sort order: +
Map-reduce partition columns: UDFToDouble(_col0) (type: double)
Statistics: Num rows: 4 Data size: 736 Basic stats: COMPLETE Column stats: COMPLETE
value expressions: _col0 (type: string)
TableScan
alias: _dummy_table
Row Limit Per Split: 1
Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: UDFToDouble('2') is not null (type: boolean)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Select Operator
expressions: '2' (type: string)
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Union
Statistics: Num rows: 8 Data size: 184 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: UDFToDouble(_col0) is not null (type: boolean)
Statistics: Num rows: 4 Data size: 736 Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: UDFToDouble(_col0) (type: double)
sort order: +
Map-reduce partition columns: UDFToDouble(_col0) (type: double)
Statistics: Num rows: 4 Data size: 736 Basic stats: COMPLETE Column stats: COMPLETE
value expressions: _col0 (type: string)
TableScan
alias: _dummy_table
Row Limit Per Split: 1
Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: UDFToDouble('34') is not null (type: boolean)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Select Operator
expressions: '34' (type: string)
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Union
Statistics: Num rows: 8 Data size: 184 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: UDFToDouble(_col0) is not null (type: boolean)
Statistics: Num rows: 4 Data size: 736 Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: UDFToDouble(_col0) (type: double)
sort order: +
Map-reduce partition columns: UDFToDouble(_col0) (type: double)
Statistics: Num rows: 4 Data size: 736 Basic stats: COMPLETE Column stats: COMPLETE
value expressions: _col0 (type: string)
TableScan
alias: _dummy_table
Row Limit Per Split: 1
Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: UDFToDouble('3ab4') is not null (type: boolean)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Select Operator
expressions: '3ab4' (type: string)
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Union
Statistics: Num rows: 8 Data size: 184 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: UDFToDouble(_col0) is not null (type: boolean)
Statistics: Num rows: 4 Data size: 736 Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: UDFToDouble(_col0) (type: double)
sort order: +
Map-reduce partition columns: UDFToDouble(_col0) (type: double)
Statistics: Num rows: 4 Data size: 736 Basic stats: COMPLETE Column stats: COMPLETE
value expressions: _col0 (type: string)
TableScan
alias: _dummy_table
Row Limit Per Split: 1
Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: UDFToDouble('ab') is not null (type: boolean)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Select Operator
expressions: 'ab' (type: string)
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Union
Statistics: Num rows: 8 Data size: 184 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: UDFToDouble(_col0) is not null (type: boolean)
Statistics: Num rows: 4 Data size: 736 Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: UDFToDouble(_col0) (type: double)
sort order: +
Map-reduce partition columns: UDFToDouble(_col0) (type: double)
Statistics: Num rows: 4 Data size: 736 Basic stats: COMPLETE Column stats: COMPLETE
value expressions: _col0 (type: string)
TableScan
alias: _dummy_table
Row Limit Per Split: 1
Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: UDFToDouble('') is not null (type: boolean)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Select Operator
expressions: '' (type: string)
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Union
Statistics: Num rows: 8 Data size: 184 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: UDFToDouble(_col0) is not null (type: boolean)
Statistics: Num rows: 4 Data size: 736 Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: UDFToDouble(_col0) (type: double)
sort order: +
Map-reduce partition columns: UDFToDouble(_col0) (type: double)
Statistics: Num rows: 4 Data size: 736 Basic stats: COMPLETE Column stats: COMPLETE
value expressions: _col0 (type: string)
TableScan
alias: _dummy_table
Row Limit Per Split: 1
Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: UDFToDouble(' ') is not null (type: boolean)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Select Operator
expressions: ' ' (type: string)
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Union
Statistics: Num rows: 8 Data size: 184 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: UDFToDouble(_col0) is not null (type: boolean)
Statistics: Num rows: 4 Data size: 736 Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: UDFToDouble(_col0) (type: double)
sort order: +
Map-reduce partition columns: UDFToDouble(_col0) (type: double)
Statistics: Num rows: 4 Data size: 736 Basic stats: COMPLETE Column stats: COMPLETE
value expressions: _col0 (type: string)
TableScan
alias: _dummy_table
Row Limit Per Split: 1
Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: COMPLETE
Select Operator
expressions: UDFToString(null) (type: string)
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: COMPLETE
Union
Statistics: Num rows: 8 Data size: 184 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: UDFToDouble(_col0) is not null (type: boolean)
Statistics: Num rows: 4 Data size: 736 Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: UDFToDouble(_col0) (type: double)
sort order: +
Map-reduce partition columns: UDFToDouble(_col0) (type: double)
Statistics: Num rows: 4 Data size: 736 Basic stats: COMPLETE Column stats: COMPLETE
value expressions: _col0 (type: string)
TableScan
alias: _dummy_table
Row Limit Per Split: 1
Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: UDFToDouble(1) is not null (type: boolean)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Select Operator
expressions: 1 (type: int)
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Union
Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Filter Operator
predicate: UDFToDouble(_col0) is not null (type: boolean)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Reduce Output Operator
key expressions: UDFToDouble(_col0) (type: double)
sort order: +
Map-reduce partition columns: UDFToDouble(_col0) (type: double)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
value expressions: _col0 (type: int)
TableScan
alias: _dummy_table
Row Limit Per Split: 1
Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: UDFToDouble(2) is not null (type: boolean)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Select Operator
expressions: 2 (type: int)
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Union
Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Filter Operator
predicate: UDFToDouble(_col0) is not null (type: boolean)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Reduce Output Operator
key expressions: UDFToDouble(_col0) (type: double)
sort order: +
Map-reduce partition columns: UDFToDouble(_col0) (type: double)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
value expressions: _col0 (type: int)
Reduce Operator Tree:
Join Operator
condition map:
Inner Join 0 to 1
keys:
0 UDFToDouble(_col0) (type: double)
1 UDFToDouble(_col0) (type: double)
outputColumnNames: _col0, _col1
Statistics: Num rows: 4 Data size: 809 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 4 Data size: 809 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
Time taken: 0.148 seconds, Fetched: 248 row(s)
hive>
Reduce Operator Tree:
Join Operator
condition map:
Inner Join 0 to 1
keys:
0 UDFToDouble(_col0) (type: double)
1 UDFToDouble(_col0) (type: double)
2 (string + int ) join int -> double ,2个字段都会转化成double(非数字的string值会转化为NULL)
> explain
> select t1.c1,t2.c1
> from (
> select 1 as c1
>
> union all
> select 2 as c1
>
> union all
> select '34' as c1
>
> union all
> select '3ab4' as c1
>
> union all
> select 'ab' as c1
>
> union all
> select '' as c1
>
> union all
> select ' ' as c1
>
> union all
> select null as c1
> ) t1
>
> join
>
> (
> select 1 as c1
>
> union all
> select 2 as c1
>
> ) t2
> on t1.c1 = t2.c1
>
>
> ;
OK
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-1
Map Reduce
Map Operator Tree:
TableScan
alias: _dummy_table
Row Limit Per Split: 1
Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: UDFToDouble(1) is not null (type: boolean)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Select Operator
expressions: 1 (type: int)
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Union
Statistics: Num rows: 8 Data size: 184 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: UDFToDouble(_col0) is not null (type: boolean)
Statistics: Num rows: 4 Data size: 736 Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: UDFToDouble(_col0) (type: double)
sort order: +
Map-reduce partition columns: UDFToDouble(_col0) (type: double)
Statistics: Num rows: 4 Data size: 736 Basic stats: COMPLETE Column stats: COMPLETE
value expressions: _col0 (type: string)
TableScan
alias: _dummy_table
Row Limit Per Split: 1
Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: UDFToDouble(2) is not null (type: boolean)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Select Operator
expressions: 2 (type: int)
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Union
Statistics: Num rows: 8 Data size: 184 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: UDFToDouble(_col0) is not null (type: boolean)
Statistics: Num rows: 4 Data size: 736 Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: UDFToDouble(_col0) (type: double)
sort order: +
Map-reduce partition columns: UDFToDouble(_col0) (type: double)
Statistics: Num rows: 4 Data size: 736 Basic stats: COMPLETE Column stats: COMPLETE
value expressions: _col0 (type: string)
TableScan
alias: _dummy_table
Row Limit Per Split: 1
Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: UDFToDouble('34') is not null (type: boolean)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Select Operator
expressions: '34' (type: string)
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Union
Statistics: Num rows: 8 Data size: 184 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: UDFToDouble(_col0) is not null (type: boolean)
Statistics: Num rows: 4 Data size: 736 Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: UDFToDouble(_col0) (type: double)
sort order: +
Map-reduce partition columns: UDFToDouble(_col0) (type: double)
Statistics: Num rows: 4 Data size: 736 Basic stats: COMPLETE Column stats: COMPLETE
value expressions: _col0 (type: string)
TableScan
alias: _dummy_table
Row Limit Per Split: 1
Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: UDFToDouble('3ab4') is not null (type: boolean)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Select Operator
expressions: '3ab4' (type: string)
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Union
Statistics: Num rows: 8 Data size: 184 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: UDFToDouble(_col0) is not null (type: boolean)
Statistics: Num rows: 4 Data size: 736 Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: UDFToDouble(_col0) (type: double)
sort order: +
Map-reduce partition columns: UDFToDouble(_col0) (type: double)
Statistics: Num rows: 4 Data size: 736 Basic stats: COMPLETE Column stats: COMPLETE
value expressions: _col0 (type: string)
TableScan
alias: _dummy_table
Row Limit Per Split: 1
Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: UDFToDouble('ab') is not null (type: boolean)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Select Operator
expressions: 'ab' (type: string)
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Union
Statistics: Num rows: 8 Data size: 184 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: UDFToDouble(_col0) is not null (type: boolean)
Statistics: Num rows: 4 Data size: 736 Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: UDFToDouble(_col0) (type: double)
sort order: +
Map-reduce partition columns: UDFToDouble(_col0) (type: double)
Statistics: Num rows: 4 Data size: 736 Basic stats: COMPLETE Column stats: COMPLETE
value expressions: _col0 (type: string)
TableScan
alias: _dummy_table
Row Limit Per Split: 1
Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: UDFToDouble('') is not null (type: boolean)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Select Operator
expressions: '' (type: string)
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Union
Statistics: Num rows: 8 Data size: 184 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: UDFToDouble(_col0) is not null (type: boolean)
Statistics: Num rows: 4 Data size: 736 Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: UDFToDouble(_col0) (type: double)
sort order: +
Map-reduce partition columns: UDFToDouble(_col0) (type: double)
Statistics: Num rows: 4 Data size: 736 Basic stats: COMPLETE Column stats: COMPLETE
value expressions: _col0 (type: string)
TableScan
alias: _dummy_table
Row Limit Per Split: 1
Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: UDFToDouble(' ') is not null (type: boolean)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Select Operator
expressions: ' ' (type: string)
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Union
Statistics: Num rows: 8 Data size: 184 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: UDFToDouble(_col0) is not null (type: boolean)
Statistics: Num rows: 4 Data size: 736 Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: UDFToDouble(_col0) (type: double)
sort order: +
Map-reduce partition columns: UDFToDouble(_col0) (type: double)
Statistics: Num rows: 4 Data size: 736 Basic stats: COMPLETE Column stats: COMPLETE
value expressions: _col0 (type: string)
TableScan
alias: _dummy_table
Row Limit Per Split: 1
Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: COMPLETE
Select Operator
expressions: UDFToString(null) (type: string)
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: COMPLETE
Union
Statistics: Num rows: 8 Data size: 184 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: UDFToDouble(_col0) is not null (type: boolean)
Statistics: Num rows: 4 Data size: 736 Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: UDFToDouble(_col0) (type: double)
sort order: +
Map-reduce partition columns: UDFToDouble(_col0) (type: double)
Statistics: Num rows: 4 Data size: 736 Basic stats: COMPLETE Column stats: COMPLETE
value expressions: _col0 (type: string)
TableScan
alias: _dummy_table
Row Limit Per Split: 1
Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: UDFToDouble(1) is not null (type: boolean)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Select Operator
expressions: 1 (type: int)
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Union
Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Filter Operator
predicate: UDFToDouble(_col0) is not null (type: boolean)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Reduce Output Operator
key expressions: UDFToDouble(_col0) (type: double)
sort order: +
Map-reduce partition columns: UDFToDouble(_col0) (type: double)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
value expressions: _col0 (type: int)
TableScan
alias: _dummy_table
Row Limit Per Split: 1
Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: UDFToDouble(2) is not null (type: boolean)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Select Operator
expressions: 2 (type: int)
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Union
Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Filter Operator
predicate: UDFToDouble(_col0) is not null (type: boolean)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Reduce Output Operator
key expressions: UDFToDouble(_col0) (type: double)
sort order: +
Map-reduce partition columns: UDFToDouble(_col0) (type: double)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
value expressions: _col0 (type: int)
Reduce Operator Tree:
Join Operator
condition map:
Inner Join 0 to 1
keys:
0 UDFToDouble(_col0) (type: double)
1 UDFToDouble(_col0) (type: double)
outputColumnNames: _col0, _col1
Statistics: Num rows: 4 Data size: 809 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 4 Data size: 809 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
Time taken: 0.763 seconds, Fetched: 248 row(s)
hive>
Reduce Operator Tree:
Join Operator
condition map:
Inner Join 0 to 1
keys:
0 UDFToDouble(_col0) (type: double)
1 UDFToDouble(_col0) (type: double)
3 string join string -> string
hive> explain
> select t1.c1,t2.c1
> from (
> select '1' as c1
>
> union all
> select '2' as c1
>
> ) t1
>
> join
>
> (
> select '1' as c1
>
> union all
> select '2' as c1
>
> ) t2
> on t1.c1 = t2.c1
>
> ;
OK
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-1
Map Reduce
Map Operator Tree:
TableScan
alias: _dummy_table
Row Limit Per Split: 1
Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: '1' is not null (type: boolean)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Select Operator
expressions: '1' (type: string)
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Union
Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Filter Operator
predicate: _col0 is not null (type: boolean)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: string)
sort order: +
Map-reduce partition columns: _col0 (type: string)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
TableScan
alias: _dummy_table
Row Limit Per Split: 1
Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: '2' is not null (type: boolean)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Select Operator
expressions: '2' (type: string)
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Union
Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Filter Operator
predicate: _col0 is not null (type: boolean)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: string)
sort order: +
Map-reduce partition columns: _col0 (type: string)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
TableScan
alias: _dummy_table
Row Limit Per Split: 1
Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: '1' is not null (type: boolean)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Select Operator
expressions: '1' (type: string)
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Union
Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Filter Operator
predicate: _col0 is not null (type: boolean)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: string)
sort order: +
Map-reduce partition columns: _col0 (type: string)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
TableScan
alias: _dummy_table
Row Limit Per Split: 1
Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: '2' is not null (type: boolean)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Select Operator
expressions: '2' (type: string)
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Union
Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Filter Operator
predicate: _col0 is not null (type: boolean)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: string)
sort order: +
Map-reduce partition columns: _col0 (type: string)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Reduce Operator Tree:
Join Operator
condition map:
Inner Join 0 to 1
keys:
0 _col0 (type: string)
1 _col0 (type: string)
outputColumnNames: _col0, _col1
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
Time taken: 0.083 seconds, Fetched: 115 row(s)
hive>
4 int join int -> int
> explain
> select t1.c1,t2.c1
> from (
> select 1 as c1
>
> union all
> select 2 as c1
>
> ) t1
>
> join
>
> (
> select 1 as c1
>
> union all
> select 2 as c1
>
> ) t2
> on t1.c1 = t2.c1
> ;
OK
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-1
Map Reduce
Map Operator Tree:
TableScan
alias: _dummy_table
Row Limit Per Split: 1
Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: 1 is not null (type: boolean)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Select Operator
expressions: 1 (type: int)
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Union
Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Filter Operator
predicate: _col0 is not null (type: boolean)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: int)
sort order: +
Map-reduce partition columns: _col0 (type: int)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
TableScan
alias: _dummy_table
Row Limit Per Split: 1
Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: 2 is not null (type: boolean)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Select Operator
expressions: 2 (type: int)
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Union
Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Filter Operator
predicate: _col0 is not null (type: boolean)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: int)
sort order: +
Map-reduce partition columns: _col0 (type: int)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
TableScan
alias: _dummy_table
Row Limit Per Split: 1
Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: 1 is not null (type: boolean)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Select Operator
expressions: 1 (type: int)
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Union
Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Filter Operator
predicate: _col0 is not null (type: boolean)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: int)
sort order: +
Map-reduce partition columns: _col0 (type: int)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
TableScan
alias: _dummy_table
Row Limit Per Split: 1
Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: 2 is not null (type: boolean)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Select Operator
expressions: 2 (type: int)
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Union
Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Filter Operator
predicate: _col0 is not null (type: boolean)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: int)
sort order: +
Map-reduce partition columns: _col0 (type: int)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Reduce Operator Tree:
Join Operator
condition map:
Inner Join 0 to 1
keys:
0 _col0 (type: int)
1 _col0 (type: int)
outputColumnNames: _col0, _col1
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
Time taken: 0.156 seconds, Fetched: 115 row(s)
hive>
以上Hive2.0.0