聚合查询 #
介绍 #
聚合函数作用于一组值。它们通常与GROUP BY子句一起使用,将值分组为子集。
GROUP BY 子句 #
GROUP BY 表达式可以是:
- 标识符:Identifier
- 序数:Ordinal
- 表达式:Expression
标识符 #
group by 表达式可以是标识符:
os> SELECT gender, sum(age) FROM accounts GROUP BY gender;
fetched rows / total rows = 2/2
+----------+------------+
| gender | sum(age) |
|----------+------------|
| F | 28 |
| M | 101 |
+----------+------------+
序数 #
group by 表达式可以是序数:
os> SELECT gender, sum(age) FROM accounts GROUP BY 1;
fetched rows / total rows = 2/2
+----------+------------+
| gender | sum(age) |
|----------+------------|
| F | 28 |
| M | 101 |
+----------+------------+
group by 表达式可以是一个表达式。
os> SELECT abs(account_number), sum(age) FROM accounts GROUP BY abs(account_number);
fetched rows / total rows = 4/4
+-----------------------+------------+
| abs(account_number) | sum(age) |
|-----------------------+------------|
| 1 | 32 |
| 13 | 28 |
| 18 | 33 |
| 6 | 36 |
+-----------------------+------------+
聚合 #
- 聚合可以用于select。
- 聚合可以作为表达式的参数。
- 聚合可以包含表达式作为参数。
查询中的聚合 #
聚合可以用于select。
os> SELECT gender, sum(age) FROM accounts GROUP BY gender;
fetched rows / total rows = 2/2
+----------+------------+
| gender | sum(age) |
|----------+------------|
| F | 28 |
| M | 101 |
+----------+------------+
聚合表达式 #
聚合可以用作表达式的参数:
os> SELECT gender, sum(age) * 2 as sum2 FROM accounts GROUP BY gender;
fetched rows / total rows = 2/2
+----------+--------+
| gender | sum2 |
|----------+--------|
| F | 56 |
| M | 202 |
+----------+--------+
表达式作为聚合参数 #
聚合可以将表达式作为参数:
os> SELECT gender, sum(age * 2) as sum2 FROM accounts GROUP BY gender;
fetched rows / total rows = 2/2
+----------+--------+
| gender | sum2 |
|----------+--------|
| F | 56 |
| M | 202 |
+----------+--------+
COUNT 聚合 #
除了常规标识符,COUNT
聚合函数还接受诸如 *
或字面量如 1
的参数。这些不同形式的含义如下:
COUNT(field)
只有当给定字段(或表达式)在输入行中不为 null 或缺失时才会计数。COUNT(*)
将计算其所有输入行的数量。COUNT(1)
与COUNT(*)
相同,因为任何非 null 的字面量都会被计数。
聚合函数 #
COUNT #
用法:返回 SELECT 语句检索到的行中 expr 的计数。
Example:
os> SELECT gender, count(*) as countV FROM accounts GROUP BY gender;
fetched rows / total rows = 2/2
+----------+----------+
| gender | countV |
|----------+----------|
| F | 1 |
| M | 3 |
+----------+----------+
SUM #
用法:SUM(expr). 返回 expr 的总和。
Example:
os> SELECT gender, sum(age) as sumV FROM accounts GROUP BY gender;
fetched rows / total rows = 2/2
+----------+--------+
| gender | sumV |
|----------+--------|
| F | 28 |
| M | 101 |
+----------+--------+
AVG #
用法: AVG(expr). 返回 expr 的平均值。
Example:
os> SELECT gender, avg(age) as avgV FROM accounts GROUP BY gender;
fetched rows / total rows = 2/2
+----------+--------------------+
| gender | avgV |
|----------+--------------------|
| F | 28.0 |
| M | 33.666666666666664 |
+----------+--------------------+
MAX #
用法: MAX(expr). 返回 expr 的最大值。
Example:
os> SELECT max(age) as maxV FROM accounts;
fetched rows / total rows = 1/1
+--------+
| maxV |
|--------|
| 36 |
+--------+
MIN #
用法: MIN(expr). 返回 expr 的最小值。
Example:
os> SELECT min(age) as minV FROM accounts;
fetched rows / total rows = 1/1
+--------+
| minV |
|--------|
| 28 |
+--------+
VAR_POP #
用法:VAR_POP(expr). 返回 expr 的总体标准方差。
Example:
os> SELECT var_pop(age) as varV FROM accounts;
fetched rows / total rows = 1/1
+--------+
| varV |
|--------|
| 8.1875 |
+--------+
VAR_SAMP #
用法:VAR_SAMP(expr). 返回 expr 的样本方差。
Example:
os> SELECT var_samp(age) as varV FROM accounts;
fetched rows / total rows = 1/1
+--------------------+
| varV |
|--------------------|
| 10.916666666666666 |
+--------------------+
VARIANCE #
用法:VARIANCE(expr). 返回 expr 的总体标准方差。VARIANCE() 是 VAR_POP() 函数的同义词。
Example:
os> SELECT variance(age) as varV FROM accounts;
fetched rows / total rows = 1/1
+--------+
| varV |
|--------|
| 8.1875 |
+--------+
STDDEV_POP #
用法:STDDEV_POP(expr). 返回 expr 的总体标准差。
Example:
os> SELECT stddev_pop(age) as stddevV FROM accounts;
fetched rows / total rows = 1/1
+--------------------+
| stddevV |
|--------------------|
| 2.8613807855648994 |
+--------------------+
STDDEV_SAMP #
用法:STDDEV_SAMP(expr). 返回 expr 的样本标准差。
Example:
os> SELECT stddev_samp(age) as stddevV FROM accounts;
fetched rows / total rows = 1/1
+-------------------+
| stddevV |
|-------------------|
| 3.304037933599835 |
+-------------------+
STD #
用法:STD(expr). 返回 expr 的总体标准差。STD() 是STDDEV_POP() 函数的同义词。
Example:
os> SELECT stddev_pop(age) as stddevV FROM accounts;
fetched rows / total rows = 1/1
+--------------------+
| stddevV |
|--------------------|
| 2.8613807855648994 |
+--------------------+
STDDEV #
用法:STDDEV(expr). 返回 expr 的总体标准差。STDDEV() 是 STDDEV_POP() 函数的同义词。
Example:
os> SELECT stddev(age) as stddevV FROM accounts;
fetched rows / total rows = 1/1
+--------------------+
| stddevV |
|--------------------|
| 2.8613807855648994 |
+--------------------+
DISTINCT COUNT 聚合 #
要获取一个字段不同值的计数,可以在count聚合中的字段前添加关键字 DISTINCT
。
Example:
os> SELECT COUNT(DISTINCT gender), COUNT(gender) FROM accounts;
fetched rows / total rows = 1/1
+--------------------------+-----------------+
| COUNT(DISTINCT gender) | COUNT(gender) |
|--------------------------+-----------------|
| 2 | 4 |
+--------------------------+-----------------+
HAVING 子句 #
HAVING子句可以作为聚合过滤器,过滤掉不满足给定条件表达式的聚合值。
带 GROUP BY 的 HAVING #
在 SELECT
子句中定义的聚合表达式或其别名可以在 HAVING
条件中使用。
- 尽管在
HAVING
子句中允许使用非聚合表达式,但建议在WHERE
中使用非聚合表达式。 HAVING
子句中的聚合并不一定与选择列表中的聚合相同。作为对 SQL 标准的扩展,它也不限于只涉及 group by 列表上的标识符。
以下是一个典型使用 HAVING
子句的示例:
os> SELECT
... gender, sum(age)
... FROM accounts
... GROUP BY gender
... HAVING sum(age) > 100;
fetched rows / total rows = 1/1
+----------+------------+
| gender | sum(age) |
|----------+------------|
| M | 101 |
+----------+------------+
以下是另一个在 HAVING
条件中使用别名的例子。请注意,如果一个标识符是模糊的,例如既作为选择别名又作为索引字段出现,那么优先考虑别名。这意味着标识符将被 SELECT
子句中的别名表达式替换:
os> SELECT
... gender, sum(age) AS s
... FROM accounts
... GROUP BY gender
... HAVING s > 100;
fetched rows / total rows = 1/1
+----------+-----+
| gender | s |
|----------+-----|
| M | 101 |
+----------+-----+
不带 GROUP BY 的 HAVING #
此外,HAVING
子句也可以在没有GROUP BY
子句的情况下使用。这很有用,因为聚合表达式不能出现在 WHERE
子句中。
os> SELECT
... 'Total of age > 100'
... FROM accounts
... HAVING sum(age) > 100;
fetched rows / total rows = 1/1
+------------------------+
| 'Total of age > 100' |
|------------------------|
| Total of age > 100 |
+------------------------+
FILTER 子句 #
FILTER
子句可以按照语法 aggregation_function(expr) FILTER(WHERE condition_expr)
为当前聚合存储桶设置特定条件。如果指定了过滤器,则只有过滤器子句中的条件计算结果为 true 的输入行才会馈送到聚合函数;其他行将被丢弃。
带筛选器子句的聚合只能在 SELECT
子句中使用。
带 GROUP BY 的 FILTER #
带有 FILTER
子句的 group by 聚合可以为每个聚合桶设置不同的条件。以下是在 group by 聚合中使用 FILTER
的一个例子:
os> SELECT avg(age) FILTER(WHERE balance > 10000) AS filtered, gender FROM accounts GROUP BY gender
fetched rows / total rows = 2/2
+------------+----------+
| filtered | gender |
|------------+----------|
| 28.0 | F |
| 32.0 | M |
+------------+----------+
不带 GROUP BY 的 FILTER #
FILTER
子句也可以在没有 GROUP BY 的情况下用于聚合函数。例如:
os> SELECT
... count(*) AS unfiltered,
... count(*) FILTER(WHERE age > 34) AS filtered
... FROM accounts
fetched rows / total rows = 1/1
+--------------+------------+
| unfiltered | filtered |
|--------------+------------|
| 4 | 1 |
+--------------+------------+
带有 FILTER 的 Distinct count 聚合
FILTER
子句也用在 distinct count 中,在计算特定字段的不同值之前进行过滤。例如:
os> SELECT COUNT(DISTINCT firstname) FILTER(WHERE age > 30) AS distinct_count FROM accounts
fetched rows / total rows = 1/1
+------------------+
| distinct_count |
|------------------|
| 3 |
+------------------+