SUN(A)+SUM(B)!=SUM(A+B)的情况

51 阅读3分钟

SUN(A)+SUM(B)!=SUM(A+B)的情况

结论:在hive、presto、spark中,当字段A或字段B列中存在null值时,SUN(A)+SUM(B)!=SUM(A+B)

1.null + 数值 = null

> select null + 0;
+-------+
|  _c0  |
+-------+
| NULL  |
+-------+

2.在对null进行判断时,null无法和任何值(包括其本身)通过= != < > <= >=进行比较,值均为null。

select null == 1;
select null != 1;
select null == null;
select null >= 1;
-- 结果都为null
+-------+
|  _c0  |
+-------+
| NULL  |
+-------+

使用聚合函数遇到null值,结果分别如下

聚合函数col列值都为nullcol列值存在不为null
sum(col)null非null值的和
avg(col)null非null值的平均值
count(col)0非null值的个数
max(col)null非null值中最大值
min(col)null非null值中最小值

1. 验证

创建测试表,并插入测试数据(default库)

CREATE TABLE table1(customer_no STRING, stock_fee double, advertisement_fee double);

INSERT INTO table1 VALUES 
("C0001", NULL, 1130.52)
,("C0001", 0.1, 1084.72)
,("C0001", NULL, 1738.32)
,("C0001", NULL, 1817.67)
,("C0001", NULL, 3271.45)
,("C0002", NULL, 2587.76)
,("C0002", NULL, 1397.60)
,("C0002", NULL, 2871.75)
,("C0002", NULL, 812.08)
,("C0002", NULL, 1643.39)
,("C0003", 2635.23, 2587.76)
,("C0003", 1253.69, 1397.60)
,("C0003", 253.69, 2871.75)
,("C0003", 1528.12, 812.08)
,("C0003", 412.53, 1643.39);

select customer_no,sum(stock_fee) as sum,avg(stock_fee) as avg,count(stock_fee) as count,MAX(stock_fee) as max,MIN(stock_fee) as min from table1 group by customer_no;
+--------------+--------------------+---------------------+--------+----------+---------+
| customer_no  |        sum         |         avg         | count  |   max    |   min   |
+--------------+--------------------+---------------------+--------+----------+---------+
| C0001        | 0.2                | 0.1                 | 2      | 0.1      | 0.1     |
| C0002        | NULL               | NULL                | 0      | NULL     | NULL    |
| C0003        | 6083.259999999999  | 1216.6519999999998  | 5      | 2635.23  | 253.69  |
+--------------+--------------------+---------------------+--------+----------+---------+

1.1. 对单独的一列进行SUM

SELECT  customer_no
       ,SUM(stock_fee)
FROM table1
GROUP BY  customer_no;

+--------------+--------------------+
| customer_no  |        _c1         |
+--------------+--------------------+
| C0001        | 0.1                |
| C0002        | NULL               |
| C0003        | 6083.259999999999  |
+--------------+--------------------+
  • 当列值中存在NULL时(C0001),sum结果不会报错,且计算结果正确
  • 当列值中全部为NULL时(C0002),sum结果为null
  • 当列值中不存在NULL值时(C0003),对数值进行sum,无异议

1.2. SUM(A)+SUM(B)

SELECT  customer_no
       ,SUM(stock_fee)+SUM(advertisement_fee)
FROM table1
GROUP BY  customer_no;

+--------------+-----------+
| customer_no  |    _c1    |
+--------------+-----------+
| C0001        | 9042.78   |
| C0002        | NULL      |
| C0003        | 15395.84  |
+--------------+-----------+
  • 对于c0001,SUM(stock_fee)=0.1,SUM(advertisement_fee)=9042.68,因此SUM(stock_fee)+SUM(advertisement_fee)=9042.78
  • 对于c0002,SUM(stock_fee)=null,SUM(advertisement_fee)=9312.58,因此SUM(stock_fee)+SUM(advertisement_fee)=null
  • 对于c0003,SUM(stock_fee)=6083.259999999999,SUM(advertisement_fee)=9312.58,因此SUM(stock_fee)+SUM(advertisement_fee)=15395.84

1.3. SUM(A+B)

SELECT  customer_no
       ,SUM(stock_fee + advertisement_fee)
FROM table1
GROUP BY  customer_no;

+--------------+---------------------+
| customer_no  |         _c1         |
+--------------+---------------------+
| C0001        | 1084.82             |
| C0002        | NULL                |
| C0003        | 15395.839999999998  |
+--------------+---------------------+
  • 对于c0001,只有0.1 + 1084.72 不为null,因此SUM(stock_fee + advertisement_fee)结果为1084.82
  • 对于c0002,只有stock_fee + advertisement_fee都为null,因此SUM(stock_fee + advertisement_fee)结果为null
  • 对于c0003,只有stock_fee + advertisement_fee都不为为null,因此SUM(stock_fee + advertisement_fee)结果为15395.839999999998

2. 解决

想要sum时得到正确结果,在sum时对每个字段值都判断是否为null,当为null时置为0

以下二者结果相同

SELECT  customer_no
       ,SUM(if(stock_fee is null,0,stock_fee) + if(advertisement_fee is null,0,advertisement_fee))
FROM table1
GROUP BY  customer_no;

SELECT  customer_no
       ,SUM(if(stock_fee is null,0,stock_fee)) + SUM(if(advertisement_fee is null,0,advertisement_fee))
FROM table1
GROUP BY  customer_no;

+--------------+-----------+
| customer_no  |    _c1    |
+--------------+-----------+
| C0001        | 9042.78   |
| C0002        | 9312.58   |
| C0003        | 15395.84  |
+--------------+-----------+