跟Mosh老师学SQL的第四天 - 聚合函数

118 阅读6分钟

第一部分:Aggregate Functions - 聚合函数

MAX()
MIN()
AVG()
SUM()
COUNT()
USE sql_invoicing;

SELECT 
    MAX(invoice_total) AS highest,
    MIN(invoice_total) AS lowest,
    AVG(invoice_total) AS average,
    SUM(invoice_total * 1.1) AS total
    COUNT(DISTINCT client_id) AS total_records
FROM invoices
WHERE invoice_date > '2019-07-01'

--- 查询结果的列名为MAX(invoice_total)

--- 聚合函数的查询也可以用于日期和字符串
--- 例如:
MAX(payment_date) AS highest
查询结果是最近收到付款的日期

--- 聚合函数只运行计算非空值,如果列中有空值,它就不会被算在函数中
例如:
COUNT(invoice_total) AS number_of_invoices,
--- 查询结果是17
COUNT(payment_date) AS count_of_payments
--- 查询结果是15

--- 如果你想得到表格中所有记录条目,不管是不是空值
COUNT(*) AS total_records

--- 默认状态下,所有这些值会取重复值
--- 如果想要排除重复条目,则必须使用distinct关键词

EXERCISE:

Part1 - Aggregate Functions - 聚合函数
-- 对invoices表编写一个查询来得到以下结果

-- 结果有四列
-- date_range, total_sales, total_payments, what_we_expect
-- 第四列是二三列的差值

SOLUTION:

USE sql_invoicing;

SELECT
    'First half of 2019' AS date_range,
    SUM(invoice_total) AS total_sales,
    SUM(payment_total) AS total_payments,
    SUM(invoice_total - payment_total) AS what_we_expect
FROM invoices
WHERE invoice_date 
    BETWEEN '2019-01-01' AND '2019-06-30'
UNION
SELECT
    'Second half of 2019' AS date_range,
    SUM(invoice_total) AS total_sales,
    SUM(payment_total) AS total_payments,
    SUM(invoice_total - payment_total) AS what_we_expect
FROM invoices
WHERE invoice_date 
    BETWEEN '2019-07-01' AND '2019-12-31'
UNION
SELECT
    'Total' AS date_range,
    SUM(invoice_total) AS total_sales,
    SUM(payment_total) AS total_payments,
    SUM(invoice_total - payment_total) AS what_we_expect
FROM invoices
WHERE invoice_date 
    BETWEEN '2019-01-01' AND '2019-12-31'

第二部分:The GROUP BY Clause - GROUP BY 子句

SELECT
    SUM(invoice_total) AS total_sales
FROM invoices
--- 这个简单的查询,用SUM函数计算了总销售
--- 但是 如果你想知道每个客户的总销售额的话应该怎么办?

SELECT
    SUM(invoice_total) AS total_sales
FROM invoices
GROUP BY client_id
--- 当我们执行查询,查询的结果是四个值
--- 这些值就是每个客户的总销售额
下例为如何利用一列来数据分组 - Clause的顺序十分重要
SELECT
    client_id,
    SUM(invoice_total) AS total_sales
FROM invoices 
WHERE invoice_date >= '2019-07-01'
GROUP BY client_id
--- 默认状态下,数据是按照group by子句中指定的列排序的
--- 但是我们可以用order by子句来调整排序顺序
ORDER BY total_sales DESC
下例为如何利用多列来数据分组,查看每个state和city的总销售
SELECT
    state,
    city,
    SUM(invoice_total) AS total_sales
FROM invoices i
JOIN clients USING (client_id)
WHERE invoice_date >= '2019-07-01'
GROUP BY state, city

EXERCISE:

Part2 - The GROUP BY Clause - GROUP BY 子句
-- 结果有三列date,payment_menthod, total_payments
-- 可以看到每个date和payment method组合的total payments
-- 例如,在18日,收到一个用credit card支付的32.77美金 和 一个用Cash支付的10美金

SOLUTION:

SELECT
    date,
    pm.name AS payment_method,
    SUM(amount) AS total_payments
FROM payments p
JOIN payment_methods pm
    ON p.payment_method = pm.payment_method_id
GROUP BY date, payment_method
ORDER BY date

第三部分:The HAVING Clause - HAVING 子句

SELECT
    client_id,
    SUM(invoice_total) AS total_sales
FROM invoices
GROUP BY client_id

--- 现在,如果只想包含total_sales高于500美金的客户怎么办
--- 也就是说我们不想要查询结果中的client_id 为2的客户
--- 在这种情况下我们无法使用WHERE Clause
--- 下面是错误示范:

SELECT
    client_id,
    SUM(invoice_total) AS total_sales
FROM invoices
WHERE total_sales > 500
--- 因为这个时候我们还没有对数据分组
--- 所以在WHERE这行SQL的时候,我们还不知道每个客户的总销售额
--- 这就是我们无法使用WHERE子句来解决这个问题的原因
--- WHERE是在分组行之前 筛选数据
GROUP BY client_id

--- 下面是正确示范:
--- 使用HAVING Clause得以在分组行之后筛选数据
SELECT
    client_id,
    SUM(invoice_total) AS total_sales
FROM invoices
GROUP BY client_id
HAVING total_sales > 500
--- 对于每个客户,可以看到总销售额和发票数量
SELECT
    client_id,
    SUM(invoice_total) AS total_sales,
    COUNT(*) AS number_of_invoices
FROM invoices
GROUP BY client_id
HAVING total_sales > 500
--- 只返回总销售额达到或超过500美金
--- 且发票数量超过5点记录
--- 所以我们想排除前两条记录,因为它们没有超过5张发票
--- 做法是在HAVING Clause,可以写一个复合搜索条件
SELECT
    client_id,
    SUM(invoice_total) AS total_sales,
    COUNT(*) AS number_of_invoices
FROM invoices
GROUP BY client_id
HAVING total_sales > 500 AND number_of_invoices > 5
--- HAVING 同 WHERE Clause一样,我们可以打上1个及多个条件
--- 但是在 HAVING Clause中用到的列
--- 必须得是我们select Clause子句中存在的
--- 换句话说 此处我们不能提到payment_date列,因为这一列不存在于我们的select子句中
--- 相比之下
--- 在FROM之后的WHERE子句,我们可以使用任何列,不管我们有没有select它们

EXERCISE:

Part3 - The HAVING Clause - HAVING 子句
-- Get the customers
-- located in Virginia 
-- 这是第一步
-- who have spent more than $100

SOLUTION:

USE sql_store;

SELECT *
FROM customers
WHERE state = 'VA'
-- 目前的查询结果中的列为first_name, last_name, birth_date, phone, address
-- 我们可以看到查询的结果中没有地方提到销售信息,我们不知道每个人付款的信息

-- 在order_items table 中,我们知道每个订单卖了多少
-- 所以对每个订单、每个产品、我们知道对应的数量和单价
-- 然而 order_items table 中没有顾客信息
-- 所以我们不知道谁买了这些产品
-- 这些信息在orders表中,在我们知道谁订购了什么
-- 所以我们现在要做的就说连接customers表和orders表
-- 所以我们可以找到每个顾客的订单

-- 然后我们要连接orders表和orders_items表
-- 来计算每位顾客的总消费额

SELECT *
FROM customers c
JOIN orders o USING (customer_id)
JOIN order_items oi USING (order_id)
WHERE state = 'VA'
-- 查询的结果中 首先看到的是连接列 分别是order_id customer_id
-- first_name, last_name, birth_date, phone, address, city, state -> 顾客信息
-- points, order_date, status, comments, shipped_date, shipper_id,
-- product_id, quantity, unit_price -> 每个order item的信息
-- 现在从这张表格我们可以计算每位顾客的消费总量

SELECT
    c.customer_id,
    c.first_name,
    c.last_name,
    SUM(oi.quantity * oi.unit_price) AS total_sales
FROM customers c
JOIN orders o USING (customer_id)
JOIN order_items oi USING (order_id)
WHERE state = 'VA'
-- 在这里我们要根据3列来分组数据,分别是 customer_id, first_name, last_name
-- 一般来说,每当你在 select clause 中有聚合函数,而且在对数据分组
-- 那么你可以直接根据 select clause 里的所有列来进行分组
GROUP BY 
    c.customer_id,
    c.first_name,
    c.last_name
HAVING total_sales > 100

第四部分:The ROLLUP Operator - ROLLUP运算符

SELECT
    client_id,
    SUM(invoice_total) AS total_sales
FROM invoices
GROUP BY client_id
--- 再次使用同一个例子,为每个客户计算了总销售
--- 在MySQL中 with rollup可以用于汇总数据
SELECT
    client_id,
    SUM(invoice_total) AS total_sales
FROM invoices
GROUP BY client_id WITH ROLLUP
--- 查询的结果会额外得到一行,该行汇总了整个结果集
--- 2590是所有客户的总销售额
--- 注意这行的client_id是空值,因为把这些值加起来没有意义
--- 所以 rollup运算符只能应用于聚合值aggregate values的列

--- 如果我们用多列来分组怎么办
SELECT
    state,
    city,
    SUM(invoice_total) AS total_sales
FROM invoices i
JOIN clients c USING (client_id)
GROUP BY state, city WITH ROLLUP
--- 在这下面我们可以看到所有加州城市的总销售
--- rollup operator对每一组的总计做了计算
--- 当你进行多列分组,并运用rollup运算符
--- 你会看到每个组及整个结果集的汇总值

EXERCISE:

Part4 - The ROLLUP Operator - ROLLUP运算符
-- 结果有两行 Cash、Credit Card
-- 两列 payment_method、total
-- 查询的结果可以看到每种付款方式的总和以及收到的款项
-- 需要利用payments表来编写这段查询

SOLUTION:

SELECT
    payment_method,
    SUM(amount) AS total
FROM payments p
JOIN payment_methods pm
-- 因为两表中的列名不一致 所以无法用USING
    ON p.payment_method = pm.payment_method_id
GROUP BY payment_method WITH ROLLUP

SELECT
    pm.name AS payment_method,
    SUM(amount) AS total
FROM payments p
JOIN payment_methods pm
-- 因为两表中的列名不一致 所以无法用USING
    ON p.payment_method = pm.payment_method_id
GROUP BY payment_method WITH ROLLUP

-- 现在我们需要把payment_method_id换成payment_method的实际名称
-- 但是上面会报错
-- 因为我们在使用rollup运算符的时候,我们不能在groupby子句中使用列别名
-- 所以下面的才是对的

SELECT
    pm.name AS payment_method,
    SUM(amount) AS total
FROM payments p
JOIN payment_methods pm
-- 因为两表中的列名不一致 所以无法用USING
    ON p.payment_method = pm.payment_method_id
GROUP BY pm.name WITH ROLLUP