The
group by clause divides the output of a query into groups. You can group by one or more column names, or by the results of computed columns using numeric datatypes in an expression. When used with aggregates,
group by retrieves the calculations in each subgroup, and may return multiple rows. The maximum number of columns or expressions you can use in a
group by clause is 16.
You cannot
group by columns of text or image datatypes.
While you can use
group by without aggregates, such a construction has limited functionality and may produce confusing results. The following example groups the results by title type:
select type, advance
from titles
group by type
type advance
------------ ---------
business 5,000.00
business 5,000.00
business 10,125.00
business 5,000.00
mod_cook 0.00
mod_cook 15,000.00
UNDECIDED NULL
popular_comp 7,000.00
popular_comp 8,000.00
popular_comp NULL
psychology 7,000.00
psychology 2,275.00
psychology 6,000.00
psychology 2,000.00
psychology 4,000.00
trad_cook 7,000.00
trad_cook 4,000.00
trad_cook 8,000.00
(18 rows affected)
With an aggregate for the
advance column, the query returns the sum for each group:
select type, sum(advance)
from titles
group by type
type
------------ ------------------------
UNDECIDED NULL
business 25,125.00
mod_cook 15,000.00
popular_comp 15,000.00
psychology 21,275.00
trad_cook 19,000.00
(6 rows affected)
The summary values in a
group by clause using aggregates are called vector aggregates, as opposed to scalar aggregates, which result when only one row is returned (see
"Using aggregate functions").
group by syntax
The complete syntax of the
select statement is repeated here so that you can see the
group by clause in context:
select [all | distinct] select_list
[into [[database.]owner.]table_name]
[from [[database.]owner.]{view_name | table_name [(index {index_name | table_name}
[parallel [degree_of_parallelism]]
[prefetch size ] [lru | mru])]}
[holdlock | noholdlock] [shared]
[,[[database.]owner.]{view_name | table_name [(index {index_name | table_name}
[parallel [degree_of_parallelism]]
[prefetch size] [lru | mru])]}
[holdlock | noholdlock] [shared]]...]
[where search_conditions]
[group by [all] aggregate_free_expression [, aggregate_free_expression]...]
[having search_conditions]
[order by
{[[[database.]owner.] {table_name. | view_name.}]
column_name | select_list_number | expression} [asc | desc]
[, {[[[database.]owner.]{table_name|view_name.}] column_name | select_list_number | expression}
[asc | desc]]...]
[compute row_aggregate (column_name)
[, row_aggregate(column_name)]...
[by column_name [, column_name]...]]
[for {read only | update [of column_name_list]}]
[at isolation {read uncommitted | read committed | serializable}]
[for browse]
Remember that the order of the clauses in the
select statement is significant. You can omit any of the optional clauses, but when you use them, they must appear in the order shown above.
group by and SQL standardsThe SQL standards for
group by are more restrictive than Sybase's standard. The SQL standard requires that:
- The columns in a select list must be in the group by expression or they must be arguments of aggregate functions.
- A group by expression can only contain column names in the select list, but not those used only as arguments for vector aggregates.
Several Transact-SQL extensions (described in the following sections) relax these restrictions, however the more complex result sets may be more difficult to understand. If you set the
fipsflagger option as follows, you will receive a warning message stating that Transact-SQL extensions are used:
set fipsflagger on
For more information about the
fipsflagger option, see the
set command in the
Reference Manual.
Nesting groups with
group byYou can list more than one column in the
group by clause to nest groups. Once the sets are established with
group by, the aggregates are applied. This statement finds the average price and the sum of book sales, grouped first by publisher identification number and then by type:
select pub_id, type, avg(price), sum(total_sales)
from titles
group by pub_id, type
pub_id type
------ ------------ ------ -------
0736 business 2.99 18,722
0736 psychology 11.48 9,564
0877 UNDECIDED NULL NULL
0877 mod_cook 11.49 24,278
0877 psychology 21.59 375
0877 trad_cook 15.96 19,566
1389 business 17.31 12,066
1389 popular_comp 21.48 12,875
(8 rows affected)
You can nest many groups within groups, up to the maximum of 16 columns or expressions specified with
group by.
Referencing other columns in queries using group by
SQL standards state that the
group by must contain items from the select list. However, Transact-SQL allows you to specify any valid column name in either the
group by or select list, whether they employ aggregates or not.
Through the following extensions, Sybase lifts restrictions on what you can include or omit in the select list of a query that includes
group by.
- The columns in the select list are not limited to the grouping columns and columns used with the vector aggregates.
- The columns specified by group by are not limited to those non-aggregate columns in the select list.
A vector aggregate must be accompanied by a
group by clause. The SQL standards require that the non-aggregate columns in the select list match the
group by columns. However, the first bulleted item described above allows you to specify additional, extended columns in the select list of the query.
For example, many versions of SQL do not allow the inclusion of the extended title_id column in the select list, but it is legal in Transact-SQL:
select type, title_id, avg(price), avg(advance)
from titles
group by type
type title_id
------------ -------- ----- -------
business BU1032 13.73 6,281.25
business BU1111 13.73 6,281.25
business BU2075 13.73 6,281.25
business BU7832 13.73 6,281.25
mod_cook MC2222 11.49 7,500.00
mod_cook MC3021 11.49 7,500.00
UNDECIDED MC3026 NULL NULL
popular_comp PC1035 21.48 7,500.00
popular_comp PC8888 21.48 7,500.00
popular_comp PC9999 21.48 7,500.00
psychology PS1372 13.50 4,255.00
psychology PS2091 13.50 4,255.00
psychology PS2106 13.50 4,255.00
psychology PS3333 13.50 4,255.00
psychology PS7777 13.50 4,255.00
trad_cook TC3218 15.96 6,333.33
trad_cook TC4203 15.96 6,333.33
trad_cook TC7777 15.96 6,333.33
(18 rows affected)
The above example still aggregates the
price and
advance columns based on the
type column, but its results also display the
title_id for the books included in each group.
The second extension described above allows you to group columns that are not specified as columns in the select list of the query. These columns do not appear in the results, but the vector aggregates still compute their summary values. For example:
select state, count(au_id)
from authors
group by state, city
state
----- --------
CA 2
CA 1
CA 5
CA 5
CA 2
CA 1
CA 1
CA 1
CA 1
IN 1
KS 1
MD 1
MI 1
OR 1
TN 1
UT 2
(16 rows affected)
This example groups the vector aggregate results by both
state and
city, even though it does not display which city belongs to each group. Therefore, results are potentially misleading.
You may think the following query should produce similar results to the previous query, since only the vector aggregate seems to tally the number of each city for each row:
select state, count(au_id)
from authors
group by city
However, its results are much different. By not using
group by with both the
state and
city columns, the query tallies the number of each city, but it displays the tally for each row of that city in
authors rather than grouping them into one result row per city.
state
----- -----------
CA 1
CA 5
CA 2
CA 1
CA 5
KS 1
CA 2
CA 2
CA 1
CA 1
TN 1
OR 1
CA 1
MI 1
IN 1
CA 5
CA 5
CA 5
MD 1
CA 2
CA 1
UT 2
UT 2
(23 rows affected)
When you use the Transact-SQL extensions in complex queries that include the
where clause or joins, the results may become even more difficult to understand. To avoid confusing or misleading results with
group by, Sybase suggests that you use the
fipsflagger option to identify queries that use Transact-SQL extensions. See
"group by and SQL standards" for details.
For more information about Transact-SQL extensions to
group by and how they work, see the
Reference Manual.
Expressions and group by
Another Transact-SQL extension allows you to group by an expression that does not include aggregate functions. For example:
select avg(total_sales), total_sales * price
from titles
group by total_sales * price
--------- -------------
NULL NULL
111 777.00
375 7,856.25
375 8,096.25
2045 22,392.75
3336 26,654.64
2032 40,619.68
3876 46,318.20
18722 55,978.78
4095 61,384.05
22246 66,515.54
4072 81,399.28
4095 81,859.05
4095 81,900.00
15096 180,397.20
8780 201,501.00
(16 rows affected)
The expression "
total_sales * price" is allowed.
You cannot
group by a column heading, also known as an alias, although you can still use one in your select list. This statement produces an error message:
select Category = type, title_id, avg(price), avg(advance)
from titles
group by Category
The
group by clause should be "
group by type", not "
group by Category".
select Category = type, title_id, avg(price), avg(advance)
from titles
group by type
Nesting aggregates with group by
Another Transact-SQL extension allows you to nest a vector aggregate inside a scalar aggregate. For example, to find the average price of all types of books using a non-nested aggregate, enter:
select avg(price)
from titles
group by type
---------------
NULL
13.73
11.49
21.48
13.50
15.96
(6 rows affected)
Nesting the average price inside the
max function produces the highest average price of a group of books, grouped by type:
select max(avg(price))
from titles
group by type
-------------
21.48
(1 row affected)
By definition, the
group by clause applies to the innermost aggregate--in this case,
avg.
Null values and group by
If the grouping column contains a null value, that row becomes its own group in the results. If the grouping column contains more than one null value, the null values form a single group. Here is an example that uses
group by and the
advance column, which contains some null values:
select advance, avg(price * 2)
from titles
group by advance
advance
------------------ -----------------
NULL NULL
0.00 39.98
2000.00 39.98
2275.00 21.90
4000.00 19.94
5000.00 34.62
6000.00 14.00
7000.00 43.66
8000.00 34.99
10125.00 5.98
15000.00 5.98
(11 rows affected)
If you are using the
count(column_name) aggregate function, grouping by a column that contains null values will return a count of zero for the grouping row, since
count(column_name) does not count null values. In most cases, you should use
count(*) instead. This example groups and counts on the
price column from the
titles table, which contains null values, and shows
count(*) for comparison:
select price, count(price), count(*)
from titles
group by price
price
------------- ----- -----
NULL 0 2
2.99 2 2
7.00 1 1
7.99 1 1
10.95 1 1
11.95 2 2
14.99 1 1
19.99 4 4
20.00 1 1
20.95 1 1
21.59 1 1
22.95 1 1
(12 rows affected)
where clause and group by
You can use a
where clause in a statement with
group by. Rows that do not satisfy the conditions in the
where clause are eliminated before any grouping is done. Here is an example:
select type, avg(price)
from titles
where advance > 5000
group by type
type
------------- --------
business 2.99
mod_cook 2.99
popular_comp 21.48
psychology 14.30
trad_cook 17.97
(5 rows affected)
Only the rows with advances of more than $5000 are included in the groups that are used to produce the query results.
However, the way that Adaptive Server handles extra columns in the select list and the
where clause may seem contradictory. For example:
select type, advance, avg(price)
from titles
where advance > 5000
group by type
type advance
------------- --------- --------
business 5,000.00 2.99
business 5,000.00 2.99
business 10,125.00 2.99
business 5,000.00 2.99
mod_cook 0.00 2.99
mod_cook 15,000.00 2.99
popular_comp 7,000.00 21.48
popular_comp 8,000.00 21.48
popular_comp NULL 21.48
psychology 7,000.00 14.30
psychology 2,275.00 14.30
psychology 6,000.00 14.30
psychology 2,000.00 14.30
psychology 4,000.00 14.30
trad_cook 7,000.00 17.97
trad_cook 4,000.00 17.97
trad_cook 8,000.00 17.97
(17 rows affected)
It only seems as if the query is ignoring the
where clause when you look at the results for the
advance (extended) column. Adaptive Server still computes the vector aggregate using only those rows that satisfy the
where clause, but it also displays all rows for any extended columns that you include in the select list. To further restrict these rows from the results, you must use a
having clause (described later in this chapter).
group by and all
The keyword
all in the
group by clause is a Transact-SQL enhancement. It is meaningful only if the
select statement in which it is used also includes a
where clause.
If you use
all, the query results include all the groups produced by the
group by clause, even if some groups do not have any rows that meet the search conditions. Without
all, a
select statement that includes
group by does not show groups for which no rows qualify.
Here is an example:
select type, avg(advance)
from titles
where advance > 1000 and advance < 10000
group by type
type
------------ ------------------------
business 5,000.00
popular_comp 7,500.00
psychology 4,255.00
trad_cook 6,333.33
(4 rows affected)
select type, avg(advance)
from titles
where advance > 1000 and advance < 10000
group by all type
type
------------ ------------------------
UNDECIDED NULL
business 5,000.00
mod_cook NULL
popular_comp 7,500.00
psychology 4,255.00
trad_cook 6,333.33
(6 rows affected)
The first statement produces groups only for those books that commanded advances of more than $1000 but less than $10,000. Since no modern cooking books have an advance within that range, there is no group in the results for the mod_cook type.
The second statement produces groups for all types, including modern cooking and "UNDECIDED," even though the modern cooking group does not include any rows that meet the qualification specified in the
where clause. Adaptive Server returns a NULL result for all groups that lack qualifying rows.
Using aggregates without group by
By definition, scalar aggregates apply to all rows in a table, producing a single value for the whole table for each function. The Transact-SQL extension that allows you to include extended columns with vector aggregates also allows you to include extended columns with scalar aggregates. For example, look at the
publishers table:
pub_id pub_name city state
------ ------------------ -------------- -----
0736 New Age Books Boston MA
0877 Binnet & Hardley Washington DC
1389 Algodata Infosystems Berkeley CA
It contains three rows. The following query produces a three-row scalar aggregate based on each row of the table:
select pub_id, count(pub_id)
from publishers
pub_id
---------- ---------
0736 3
0877 3
1389 3
(3 rows affected)
Because Adaptive Server treats
publishers as a single group, the scalar aggregate applies to the (single-group) table. The results display every row of the table for each column you include in the select list, in addition to the scalar aggregate.
The
where clause behaves the same way for scalar aggregates as with vector aggregates. The
where clause restricts the columns included in the aggregate summary values, but it does not affect the rows that appear in the results for each extended column you specify in the select list. For example:
select pub_id, count(pub_id)
from publishers
where pub_id < "1000"
pub_id
-------------- -----------
0736 2
0877 2
1389 2
(3 rows affected)
Like the other Transact-SQL extensions to
group by, this extension provides results that may be difficult to understand, especially for queries on large tables or queries with multitable joins.