PostgreSQL中的分组用法

128 阅读5分钟

POSTGRESQL EVALUATE CLAUSE ORDER IS:

FROM->WHERE->GROUP BY->HAVING->SELECT->DISTINCT->ORDER BY->LIMIT

PostgreSQL GROUP BY

  • GROUP BY

      1. Using PostgreSQL GROUP BY without an aggregate function example

        Group by == DISTINCT 去除重复项

      2. Using PostgreSQL GROUP BY with SUM() function example

      3. Using PostgreSQL GROUP BY clause with the JOIN clause

      4. Using PostgreSQL GROUP BY clause with a date column

        SELECT
          payment_date::date payment_date,
          SUM(amount) sum
        FROM
          payment
        GROUP BY
          payment_date::date
        ORDER BY
          payment_date DESC;
        
  • HAVING

    • to specify a search condition for a group or an aggregate
    • 因为having子句出现在select之前,having子句中不能有表达式别名,order by中可以有别名

    The WHERE clause filters the rows based on a specified condition whereas the HAVING clause filter groups of rows according to a specified condition.

    In other words, you apply the condition in the WHERE clause to the rows while you apply the condition in the HAVING clause to the groups of rows.

    SELECT
      customer_id,
      SUM (amount) amount
    FROM
      payment
    GROUP BY
      customer_id
    HAVING
      SUM (amount) > 200
    ORDER BY
      amount DESC;
    
  • GROUPING SETS

    • The GROUPING SETS allows you to define multiple grouping sets in the same query.

      The general syntax of the GROUPING SETS is as follows:

      SELECT
          c1,
          c2,
          aggregate_function(c3)
      FROM
          table_name
      GROUP BY
          GROUPING SETS (
              (c1, c2),
              (c1),
              (c2),
              ()
      );
      
    • The GROUPING() function accepts an argument which can be a column name or an expression:

      GROUPING( column_name | expression)
      

      The column_name or expression must match with the one specified in the GROUP BY clause.The GROUPING() function returns bit 0 if the argument is a member of the current grouping set and 1 otherwise.

      DROP TABLE IF EXISTS sales;
      ​
      CREATE TABLE sales (
          brand VARCHAR NOT NULL,
          segment VARCHAR NOT NULL,
          quantity INT NOT NULL,
          PRIMARY KEY (brand, segment)
      );
      ​
      INSERT INTO sales (brand, segment, quantity)
      VALUES
          ('ABC', 'Premium', 100),
          ('ABC', 'Basic', 200),
          ('XYZ', 'Premium', 100),
          ('XYZ', 'Basic', 300)
      RETURNING *;
      ​
      SELECT
          GROUPING(brand) grouping_brand,
          GROUPING(segment) grouping_segment,
          brand,
          segment,
          SUM (quantity)
      FROM
          sales
      GROUP BY
          GROUPING SETS (
              (brand),
              (segment),
              ()
          )
      ORDER BY
          brand,
          segment;
      
      #use the GROUPING() function in the HAVING clause to find the subtotal of each brand
      SELECT
          GROUPING(brand) grouping_brand,
          GROUPING(segment) grouping_segment,
          brand,
          segment,
          SUM (quantity)
      FROM
          sales
      GROUP BY
          GROUPING SETS (
              (brand),
              (segment),
              ()
          )
      HAVING GROUPING(brand) = 0
      ORDER BY
          brand,
          segment;
      
  • CUBE

    相比与grouping sets,对多维数据统计更加简洁

    In general, if the number of columns specified in the CUBE is n, then you will have 2n2^n combinations.

    CUBE(c1,c2,c3)
    <=>
    GROUPING SETS (
        (c1,c2,c3),
        (c1,c2),
        (c1,c3),
        (c2,c3),
        (c1),
        (c2),
        (c3),
        ()
     )
    
    SELECT
        brand,
        segment,
        SUM (quantity)
    FROM
        sales
    GROUP BY
        CUBE (brand, segment)
    ORDER BY
        brand,
        segment;
    ​
    brand | segment | sum 
    -------+---------+-----
     ABC   | Basic   | 200
     ABC   | Premium | 100
     ABC   | null    | 300
     XYZ   | Basic   | 300
     XYZ   | Premium | 100
     XYZ   | null    | 400
     null  | Basic   | 500
     null  | Premium | 200
     null  | null    | 700
    (9 rows)
    ​
    SELECT
        brand,
        segment,
        SUM (quantity)
    FROM
        sales
    GROUP BY
        brand,
        CUBE (segment)
    ORDER BY
        brand,
        segment;
      
    brand | segment | sum 
    -------+---------+-----
     ABC   | Basic   | 200
     ABC   | Premium | 100
     ABC   | null    | 300
     XYZ   | Basic   | 300
     XYZ   | Premium | 100
     XYZ   | null    | 400
    (6 rows)
    ​
    
  • ROLLUP

    • Different from the CUBE subclause, ROLLUP does not generate all possible grouping sets based on the specified columns. It just makes a subset of those.

      The ROLLUP assumes a hierarchy among the input columns and generates all grouping sets that make sense considering the hierarchy. This is the reason why ROLLUP is often used to generate the subtotals and the grand total for reports.

      For example, the CUBE (c1,c2,c3) makes all eight possible grouping sets:

      (c1, c2, c3)(c1, c2)(c2, c3)(c1,c3)(c1)(c2)(c3)()
      

      However, the ROLLUP(c1,c2,c3) generates only four grouping sets, assuming the hierarchy c1 > c2 > c3 as follows:

      (c1, c2, c3)(c1, c2)(c1)()
      

      A common use of ROLLUP is to calculate the aggregations of data by year, month, and date, considering the hierarchy year > month > date

      dvdrental=# DROP TABLE IF EXISTS sales;
      ('ABC', 'Basic', 200),
          ('XYZ', 'Premium', 100),
          ('XYZ', 'Basic', 300);DROP TABLE
      dvdrental=# CREATE TABLE sales (
      dvdrental(#     brand VARCHAR NOT NULL,
      dvdrental(#     segment VARCHAR NOT NULL,
      dvdrental(#     quantity INT NOT NULL,
      dvdrental(#     PRIMARY KEY (brand, segment)
      dvdrental(# );
      CREATE TABLE
      dvdrental=# 
      dvdrental=# INSERT INTO sales (brand, segment, quantity)
      dvdrental-# VALUES
      dvdrental-#     ('ABC', 'Premium', 100),
      dvdrental-#     ('ABC', 'Basic', 200),
      dvdrental-#     ('XYZ', 'Premium', 100),
      dvdrental-#     ('XYZ', 'Basic', 300);
      INSERT 0 4
      dvdrental=# SELECT
      dvdrental-#     brand,
      dvdrental-#     segment,
      dvdrental-#     SUM (quantity)
      dvdrental-# FROM
      dvdrental-#     sales
      dvdrental-# GROUP BY
      dvdrental-#     ROLLUP (brand, segment)
      dvdrental-# ORDER BY
      dvdrental-#     brand,
      dvdrental-#     segment;
       brand | segment | sum 
      -------+---------+-----
       ABC   | Basic   | 200
       ABC   | Premium | 100
       ABC   | null    | 300
       XYZ   | Basic   | 300
       XYZ   | Premium | 100
       XYZ   | null    | 400
       null  | null    | 700
      (7 rows)
      ​
      dvdrental=# SELECT
      dvdrental-#     segment,
      dvdrental-#     brand,
      dvdrental-#     SUM (quantity)
      dvdrental-# FROM
      dvdrental-#     sales
      dvdrental-# GROUP BY
      dvdrental-#     ROLLUP (segment, brand)
      dvdrental-# ORDER BY
      dvdrental-#     segment,
      dvdrental-#     brand;
       segment | brand | sum 
      ---------+-------+-----
       Basic   | ABC   | 200
       Basic   | XYZ   | 300
       Basic   | null  | 500
       Premium | ABC   | 100
       Premium | XYZ   | 100
       Premium | null  | 200
       null    | null  | 700
      (7 rows)
      ​
      dvdrental=# SELECT
      dvdrental-#     segment,
      dvdrental-#     brand,
      dvdrental-#     SUM (quantity)
      dvdrental-# FROM
      dvdrental-#     sales
      dvdrental-# GROUP BY
      dvdrental-#     segment,
      dvdrental-#     ROLLUP (brand)
      dvdrental-# ORDER BY
      dvdrental-#     segment,
      dvdrental-#     brand;
       segment | brand | sum 
      ---------+-------+-----
       Basic   | ABC   | 200
       Basic   | XYZ   | 300
       Basic   | null  | 500
       Premium | ABC   | 100
       Premium | XYZ   | 100
       Premium | null  | 200
      (6 rows)
      ​
      dvdrental=# SELECT
      dvdrental-#     EXTRACT (YEAR FROM rental_date) y,
      dvdrental-#     EXTRACT (MONTH FROM rental_date) M,
      dvdrental-#     EXTRACT (DAY FROM rental_date) d,
      dvdrental-#     COUNT (rental_id)
      dvdrental-# FROM
      dvdrental-#     rental
      dvdrental-# GROUP BY
      dvdrental-#     ROLLUP (
      dvdrental(#         EXTRACT (YEAR FROM rental_date),
      dvdrental(#         EXTRACT (MONTH FROM rental_date),
      dvdrental(#         EXTRACT (DAY FROM rental_date)
      dvdrental(#     );
        y   |  m   |  d   | count 
      ------+------+------+-------
       2005 |    5 |   24 |     8
       2005 |    5 |   25 |   137
       2005 |    5 |   26 |   174
       2005 |    5 |   27 |   166
       2005 |    5 |   28 |   196
       2005 |    5 |   29 |   154
       2005 |    5 |   30 |   158
       2005 |    5 |   31 |   163
       2005 |    5 | null |  1156
       2005 |    6 |   14 |    16
       2005 |    6 |   15 |   348
       2005 |    6 |   16 |   324
       2005 |    6 |   17 |   325
       2005 |    6 |   18 |   344
       2005 |    6 |   19 |   348
       2005 |    6 |   20 |   331
       2005 |    6 |   21 |   275
       2005 |    6 | null |  2311
       2005 |    7 |    5 |    27
       2005 |    7 |    6 |   504
       2005 |    7 |    7 |   461
       2005 |    7 |    8 |   512
       2005 |    7 |    9 |   513
       2005 |    7 |   10 |   480
       2005 |    7 |   11 |   461
       2005 |    7 |   12 |   495
       2005 |    7 |   26 |    33
       2005 |    7 |   27 |   649
       2005 |    7 |   28 |   620
       2005 |    7 |   29 |   641
       2005 |    7 |   30 |   634
       2005 |    7 |   31 |   679
       2005 |    7 | null |  6709
       2005 |    8 |    1 |   671
       2005 |    8 |    2 |   643
       2005 |    8 |   16 |    23
       2005 |    8 |   17 |   593
       2005 |    8 |   18 |   621
       2005 |    8 |   19 |   628
       2005 |    8 |   20 |   624
       2005 |    8 |   21 |   659
       2005 |    8 |   22 |   626
       2005 |    8 |   23 |   598
       2005 |    8 | null |  5686
       2005 | null | null | 15862
       2006 |    2 |   14 |   182
       2006 |    2 | null |   182
       2006 | null | null |   182
       null | null | null | 16044
      (49 rows)
      ​