DB diablo

Tuesday, September 14, 2010

Combining queries: the union operator

The union operator combines the results of two or more queries into a single result set. The Transact-SQL extension to union allows you to:

Use union in the select clause of an insert statement.
Specify new column headings in the order by clause of a select statement when union is present in the select statement.

The syntax of the union operator is as follows:

query1
     [union [all] queryN ] ... 
     [order by clause] 
     [compute
clause]

where query1 is:

select select_list[into clause]
[from clause]
[where clause]
[group by clause]
[having clause]

and queryN is:

select select_list[from clause]
[where clause]
[group by clause]
[having clause]

Figure 3-1: Union combining queries raster

For example, suppose you have the following two tables containing the data shown:
The following query creates a union between the two tables:

create table T1 (a char(4), b int)

insert T1 values ("abc", 1)

insert T1 values ("def", 2)

insert T1 values ("ghi", 3)

create table T2 (a char(4), b int)

insert T2 values ("ghi", 3)

insert T2 values ("jkl", 4)

insert T2 values ("mno", 5)

select * from T1 
union 
select * from T2

a     b
----  --------- 
abc           1 
def           2 
ghi           3 
jkl           4 
mno           5 
 
(5 rows affected)

By default, the union operator removes duplicate rows from the result set. Use the all option to include duplicate rows. Notice also that the columns in the result set have the same names as the columns in T1. You can use any number of union operators in a Transact-SQL statement. For example:

x union y union z

By default, Adaptive Server evaluates a statement containing union operators from left to right. You can use parentheses to specify a different evaluation order.
For example, the following two expressions are not equivalent:

x union all (y union z)

(x union all y) union z

In the first expression, duplicates are eliminated in the union between y and z. Then, in the union between that set and x, duplicates are not eliminated. In the second expression, duplicates are included in the union between x and y, but are then eliminated in the subsequent union with z; all does not affect the final result of this statement.

Guidelines for union queries

When you use union statements:

All select lists in the union statement must have the same number of expressions (such as column names, arithmetic expressions, and aggregate functions). The following statement is invalid because the first select list is longer than the second:

create table stores_east
(stor_id char(4) not null,
stor_name varchar(40) null,
stor_address varchar(40) null,
city varchar(20) null,
state char(2) null,
country varchar(12) null,
postalcode char(10) null,
payterms varchar(12) null)

select stor_id, city, state from stores 
union 
select stor_id, city from stores_east

drop table stores_east

Corresponding columns in all tables, or any subset of columns used in the individual queries, must be of the same datatype, or an implicit data conversion must be possible between the two datatypes, or an explicit conversion should be supplied. For example, a union is not possible between a column of the char datatype and one of the int datatype, unless an explicit conversion is supplied. However, a union is possible between a column of the money datatype and one of the int datatype. See union and "Datatype Conversion Functions" in the Reference Manual for more information about comparing datatypes in a union statement.
You must place corresponding columns in the individual queries of a union statement in the same order, because union compares the columns one to one in the order given in the query. For example, suppose you have the following tables:
Figure 3-2: Union comparing columnsThis query:
```
select a, b from T3 
union 
select b, a from T4 
```
produces:
```
a          b
---------  --- 
        1  abc 
        2  def 
        3  ghi 
 
(3 rows affected)
```
The following query results in an error message, because the datatypes of corresponding columns are not compatible:
```
select a, b from T3 
union 
select a, b from T4
```
```
drop table T3
```
```
drop table T4
```
When you combine different (but compatible) datatypes such as float and int in a union statement, Adaptive Server converts them to the datatype with the most precision.
Adaptive Server takes the column names in the table resulting from a union from the first individual query in the union statement. Therefore, to define a new column heading for the result set, do so in the first query. In addition, to refer to a column in the result set by a new name, for example, in an order by statement, refer to it in that way in the first select statement.
The following query is correct:
```
select Cities = city from stores 
union 
select city from authors 
order by Cities 
```

Using union with other Transact-SQL commands

When you use union statements with other Transact-SQL commands:

The first query in the union statement may contain an into clause that creates a table to hold the final result set. For example, the following statement creates a table called results that contains the union of tables publishers, stores, and salesdetail:
```
use master
```
```
sp_dboption pubs2, "select into", true
```
```
use pubs2
```
```
checkpoint
```
```
select pub_id, pub_name, city into results 
from publishers 
union 
select stor_id, stor_name, city from stores 
union 
select stor_id, title_id, ord_num from salesdetail 
```
You can use the into clause only in the first query; if it appears anywhere else, you get an error message.
You can use order by and compute clauses only at the end of the union statement to define the order of the final results or to compute summary values. You cannot use them within the individual queries that make up the union statement.
You can use group by and having clauses within individual queries only; you cannot use them to affect the final result set.

You can also use the union operator within an insert statement. For example:

create table tour (city varchar(20), state char(2))

insert into tour 
    select city, state from stores 
    union 
    select city, state from authors

drop table tour

You cannot use the union operator within a create view statement.
You cannot use the union operator on text and image columns.
You cannot use the for browse clause in statements involving the union operator.

The compute clause : Summarizing groups of data

The compute clause is a Transact-SQL extension. Use it with row aggregates to produce reports that show subtotals of grouped summaries. Such reports, usually produced by a report generator, are called control-break reports, since summary values appear in the report under the control of the groupings ("breaks") you specify in the compute clause.
These summary values appear as additional rows in the query results, unlike the aggregate results of a group by clause, which appear as new columns.
A compute clause allows you to see detail and summary rows with one select statement. You can calculate summary values for subgroups and you can calculate more than one row aggregate (see "Row aggregates and compute") for the same group.
The general syntax for compute is:

compute row_aggregate(column_name) 
     [, row_aggregate(column_name)]... 
     [by column_name [, column_name]...]

The row aggregates you can use with compute are sum, avg, min, max, and count. You can use sum and avg only with numeric columns. Unlike the order by clause, you cannot use the positional number of a column from the select list instead of the column name.
You cannot use text or image columns in a compute clause.
A system test may fail because there are too many aggregates in the compute clause of a query. The number of aggregates that each compute clause can accommodate is limited to 127, and if a compute clause contains more than 127 aggregates, the system generates an error message when you try to execute the query.
Each avg() aggregate counts as two aggregates when you are counting toward the limit of 127, because an avg() aggregate is actually a combination of a sum() aggregate and a count() aggregate.
Following are two queries and their results. The first one uses group by and aggregates. The second uses compute and row aggregates. Note the difference in the results.

select type, sum(price), sum(advance) 
from titles 
group by type

type 
------------   -------    ----------
UNDECIDED         NULL         NULL 
business         54.92    25,125.00 
mod_cook         22.98    15,000.00 
popular_comp     42.95    15,000.00 
psychology       67.52    21,275.00 
trad_cook        47.89    19,000.00 
 
(6 rows affected)

select type, price, advance 
from titles 
order by type 
compute sum(price), sum(advance) by type

type         price                    advance
------------ ------------------------ --------
UNDECIDED    NULL                     NULL
 
Compute Result:
------------------------ ------------------------
                   NULL                      NULL
 
type         price                advance 
------------ -------------------- ----------
business     2.99                 10,125.00
business    11.95                  5,000.00
business    19.99                  5,000.00
business    19.99                  5,000.00

Compute Result:
------------------------ ------------------------
                   54.92                25,125.00
type         price                   advance
------------ ----------------------- ---------
mod_cook      2.99                   15,000.00
mod_cook     19.99                        0.00

Compute Result:
------------------------ ------------------------
                   22.98                15,000.00
 
type          price               advance
------------- ------------------- ------------
popular_comp  NULL                        NULL
popular_comp  20.00                   8,000.00
popular_comp  22.95                   7,000.00

Compute Result:
------------------------ ------------------------
                   42.95                15,000.00
 
type         price                    advance 
------------ ------------------------ --------
psychology    7.00                    6,000.00
psychology    7.99                    4,000.00
psychology   10.95                    2,275.00
psychology   19.99                    2,000.00
psychology   21.59                    7,000.00

Compute Result:
------------------------ ------------------------
                    67.52               21,275.00
 
type         price                   advance
------------ ----------------------- --------
trad_cook    11.95                   4,000.00
trad_cook   14.99                   8,000.00
trad_cook   20.95                   7,000.00

Compute Result:
------------------------ ------------------------
                    47.89               19,000.00
 
(24 rows affected)

Each summary value is treated as a row.

Row aggregates and compute

The row aggregates used with compute are listed in Table 3-2:

How aggregates are used with a compute statement
Row aggregates	Result
sum	Total of the values in the expression
avg	Average of the values in the expression
max	Highest value in the expression
min	Lowest value in the expression
count	Number of selected rows

These row aggregates are the same aggregates that can be used with group by, except there is no row aggregate function that is the equivalent of count(*). To find the summary information produced by group by and count(*), use a compute clause without the by keyword.
Rules for compute clauses

Adaptive Server does not allow the distinct keyword with the row aggregates.
The columns in a compute clause must appear in the select list.
You cannot use select into (see Chapter 7, "Creating Databases and Tables") in the same statement as a compute clause because statements that include compute do not generate normal rows.
If you use compute with the by keyword, you must also use an order by clause. The columns listed after by must be identical to, or a subset of, those listed after order by, and must be in the same left-to-right order, start with the same expression, and not skip any expressions.
For example, suppose the order by clause is:
```
order by a, b, c 
```
The compute clause can be any or all of these:
```
compute row_aggregate (column_name) by a, b, c 
```
```
compute row_aggregate (column_name) by a, b 
```
```
compute row_aggregate (column_name) by a
```
The compute clause cannot be any of these:
```
compute row_aggregate (column_name) by b, c 
```
```
compute row_aggregate (column_name) by a, c
```
```
compute row_aggregate (column_name) by c
```
You must use a column name or an expression in the order by clause; you cannot sort by a column heading.
The compute keyword can be used without by to generate grand totals, grand counts, and so on. order by is optional if you use the compute keyword without by. The compute keyword without by is discussed under "Grand values: compute without by".

Specifying more than one column after compute

Listing more than one column after the by keyword affects the query by breaking a group into subgroups and applying the specified row aggregate to each level of grouping. For example, this query finds the sum of the prices of psychology books from each publisher:

select type, pub_id, price 
from titles 
where type = "psychology" 
order by type, pub_id, price 
compute sum(price) by type, pub_id

type        pub_id  price 
----------- ------- -------------
psychology    0736           7.00 
psychology    0736           7.99 
psychology    0736          10.95 
psychology    0736          19.99 
 
Compute Result:
---------------
          45.93

type        pub_id  price 
----------- ------- -------------
psychology    0877          21.59 
 
Compute Result:
---------------
          21.59
 
(7 rows affected)

Using more than one compute clause

You can use different aggregates in the same statement by including more than one compute clause. The following query is similar to the preceding one but adds the sum of the prices of psychology books by publisher:

select type, pub_id, price 
from titles 
where type = "psychology" 
order by type, pub_id, price 
compute sum(price) by type, pub_id 
compute sum(price) by type

type        pub_id  price 
----------- ------- --------------
psychology    0736           7.00 
psychology    0736           7.99 
psychology    0736          10.95 
psychology    0736          19.99 
 
Compute Result:
---------------
          45.93
 
type        pub_id  price 
 ---------- ------- --------------
 psychology    0877         21.59 
 
Compute Result:
---------------
          21.59
 
Compute Result:
---------------
           67.52 
 
(8 rows affected)

Applying an aggregate to more than one column

One compute clause can apply the same aggregate to several columns. This query finds the sum of the prices and advances for each type of cookbook:

select type, price, advance 
from titles 
where type like "%cook" 
order by type 
compute sum(price), sum(advance) by type

type      price            advance 
--------- ---------------- ---------------
mod_cook              2.99       15,000.00 
mod_cook             19.99            0.00 
 
Compute Result:
--------------- ---------------
          22.98       15,000.00 
 
type      price            advance 
--------- ---------------- ---------------
trad_cook            11.95        4,000.00 
trad_cook            14.99        8,000.00 
trad_cook            20.95        7,000.00 
 
Compute Result:
--------------- ---------------
          47.89       19,000.00 
 
(7 rows affected)

Remember, the columns to which the aggregates apply must also be in the select list.

Using different aggregates in the same compute clause

You can use different aggregates in the same compute clause:

select type, pub_id, price 
from titles 
where type like "%cook" 
order by type, pub_id 
compute sum(price), max(pub_id) by type

type        pub_id  price 
----------- ------- --------------
mod_cook    0877             2.99 
mod_cook    0877            19.99 
 
Compute Result:
--------------- ----
          22.98 0877 
 
type        pub_id  price 
----------- ------- --------------
trad_cook   0877            11.95 
trad_cook   0877            14.99 
trad_cook   0877            20.95 
 
Compute Result:
--------------- ----
          47.89 0877
 
(7 rows affected)

Grand values: compute without by

You can use the compute keyword without by to generate grand totals, grand counts, and so on.
This statement finds the grand total of the prices and advances of all types of books that cost more than $20:

select type, price, advance 
from titles 
where price > $20 
compute sum(price), sum(advance)

type         price            advance 
------------ ----------------  ------------- 
popular_comp            22.95      7,000.00 
psychology              21.59      7,000.00 
trad_cook               20.95      7,000.00 
 
Compute Result:
--------------- ---------
          65.49 21,000.00 
 
(4 rows affected)

You can use a compute with by and a compute without by in the same query. The following query finds the sum of prices and advances by type and then computes the grand total of prices and advances for all types of books.

select type, price, advance 
from titles 
where type like "%cook" 
order by type 
compute sum(price), sum(advance) by type 
compute sum(price), sum(advance)

type         price                advance
-----------  -----------------    ------------
mod_ cook                 2.99       15,000.00
mod_cook                 19.99            0.00
 
Compute Result:
--------------- ---------
          22.98 15,000.00
 
type         price                advance
-----------  -----------------    ------------
trad_cook                11.95        4,000.00
trad_cook                14.99        8,000.00
trad_cook                20.95        7,000.00
 
Compute Result:
--------------- ---------
          47.89 19,000.00
  
Compute Result:
--------------- ---------
          70.87 34,000.00
 
(8 rows affected)

Order by clause : Sorting query results

The order by clause allows you to sort query results by one or more columns, up to 31. Each sort is either ascending (asc) or descending (desc). If neither is specified, asc is the default. The following query orders results by pub_id:

select pub_id, type, title_id 
from titles 
order by pub_id

pub_id  type             title_id 
------  ------------     --------
0736    business         BU2075 
0736    psychology       PS2091 
0736    psychology       PS2106 
0736    psychology       PS3333 
0736    psychology       PS7777 
0877    UNDECIDED        MC3026 
0877    mod_cook         MC2222 
0877    mod_cook         MC3021 
0877    psychology       PS1372 
0877    trad_cook        TC3218 
0877    trad_cook        TC4203 
0877    trad_cook        TC7777 
1389    business         BU1032 
1389    business         BU1111 
1389    business         BU7832 
1389    popular_comp     PC1035 
1389    popular_comp     PC8888 
1389    popular_comp     PC9999 
 
(18 rows affected)

If you name more than one column in the order by clause, Adaptive Server nests the sorts. The following statement sorts the rows in the stores table first by stor_id in descending order, then by payterms (in ascending order, since desc is not specified), and finally by country (also ascending). Adaptive Server sorts null values first within any group.

select stor_id, payterms, country
from stores
order by stor_id desc, payterms

stor_id payterms     country 
------- ------------ ------------ 
8042    Net 30       USA          
7896    Net 60       USA          
7131    Net 60       USA          
7067    Net 30       USA          
7066    Net 30       USA          
6380    Net 60       USA          
5023    Net 60       USA          

(7 rows affected)

You can use the position number of a column in a select list instead of the column name. Column names and select list numbers can be mixed. Both of the following statements produce the same results as the preceding one.

select pub_id, type, title_id 
from titles 
order by 1 desc, 2, 3

select pub_id, type, title_id 
from titles 
order by 1 desc, type, 3

Most versions of SQL require that order by items appear in the select list, but Transact-SQL has no such restriction. You could order the results of the preceding query by title, although that column does not appear in the select list.
You cannot use order by on text or image columns.
Adaptive Server does not allow subqueries, aggregates, variables and constant expressions in the order by list.
With order by, null values come before all others.
The effects of an order by clause on mixed-case data depend on the sort order installed on your Adaptive Server. The basic choices are binary, dictionary order, and case-insensitive. The system procedure sp_helpsort displays the sort order for your server. See the Reference Manual for more information on sort orders.

order by and group by

You can use an order by clause to order the results of a group by in a particular way.
Put the order by clause after the group by clause. For example, to find the average price of each type of book and order the results by average price, the statement is:

select type, avg(price) 
from titles 
group by type 
order by avg(price)

type 
---------- ------------ 
UNDECIDED           NULL 
mod_cook          11.49 
psychology        13.50 
business          13.73 
trad_cook         15.96 
popular_comp      21.48 
 
(6 rows affected)

order by and group by used with select distinct

A select distinct query with order by or group by can return duplicate values if the order by or group by column is not in the select list. For example:

select distinct pub_id
from titles
order by type

pub_id
------
0877
0736
1389
0877
1389
0736
0877
0877
 
(8 rows affected)

If a query has an order by or group by clause that includes columns not in the select list, Adaptive Server adds those columns as hidden columns in the columns being processed. The columns listed in the order by or group by clause are included in the test for distinct rows. To comply with ANSI standards, include the order by or group by column in the select list. For example:

select distinct pub_id, type
from titles
order by type

pub_id type 
------ ------------
0877   UNDECIDED 
0736   business 
1389   business 
0877   mod_cook 
1389   popular_comp
0736   psychology
0877   psychology
0877   trad_cook
 
(8 rows affected)

The having clause : Selecting groups of data

Use the having clause to display or reject rows defined by the group by clause. The having clause sets conditions for the group by clause in the same way where sets conditions for the select clause, except where cannot include aggregates, while having often does. This example is legal:

select title_id 
from titles 
where title_id like "PS%" 
having avg(price) > $2.0

But this example is not:

select title_id
from titles 
where avg(price) > $20

having clauses can reference any of the items that appear in the select list.
This statement is an example of a having clause with an aggregate function. It groups the rows in the titles table by type, but eliminates the groups that include only one book:

select type 
from titles 
group by type 
having count(*) > 1

type 
---------------- 
business 
mod_cook 
popular_comp 
psychology 
trad_cook 
 
(5 rows affected)

Here is an example of a having clause without aggregates. It groups the titles table by type and displays only those types that start with the letter "p":

select type 
from titles 
group by type 
having type like "p%"

type 
------------ 
popular_comp 
psychology 
 
(2 rows affected)

When you include more than one condition in the having clause, combine the conditions with and, or, or not. For example, to group the titles table by publisher, and to include only those publishers who have paid more than $15,000 in total advances, whose books average less than $18 in price, and whose identification numbers (pub_id) are greater than 0800, the statement is:

select pub_id, sum(advance), avg(price)
from titles
group by pub_id
having sum(advance) > 15000
  and avg(price) < 18
  and pub_id > "0800"

pub_id 
------  ---------------- ---------------- 
0877          41,000.00            15.41 
 
(1 row affected)

How the having, group by, and where clauses interact

When you include the having, group by, and where clauses in a query, the sequence in which each clause affects the rows determines the final results:

The where clause excludes rows that do not meet its search conditions.
The group by clause collects the remaining rows into one group for each unique value in the group by expression.
Aggregate functions specified in the select list calculate summary values for each group.
The having clause excludes rows from the final results that do not meet its search conditions.

The following query illustrates the use of where, group by, and having clauses in one select statement containing aggregates:

select stor_id, title_id, sum(qty)
from salesdetail
where title_id like "PS%"
group by stor_id, title_id
having sum(qty) > 200

stor_id  title_id
-------  --------  -----------
5023     PS1372            375
5023     PS2091          1,845
5023     PS3333          3,437
5023     PS7777          2,206
6380     PS7777            500
7067     PS3333            345
7067     PS7777            250
 
(7 rows affected)

The query functioned in this order:

The where clause identified only rows with title_id beginning with "PS" (psychology books),
group by collected the rows by common stor_id and title_id,
The sum aggregate calculated the total number of books sold for each group, and
The having clause excluded the groups whose totals do not exceed 200 books from the final results.

All of the previous having examples adhere to the SQL standards, which specify that columns in a having expression must have a single value, and must be in the select list or group by clause. However, the Transact-SQL extensions to having allow columns or expressions not in the select list and not in the group by clause.
The following example determines the average price for each title type, but it excludes those types that do not have more than $10,000 in total sales, even though the sum aggregate does not appear in the results.

select type, avg(price)
from titles
group by type
having sum(total_sales) > 10000

type
------------  ----------
business           13.73
mod_cook           11.49
popular_comp       21.48
trad_cook          15.96
 
(4 rows affected)

The extension behaves as if the column or expression were part of the select list but not part of the displayed results. If you include an non-aggregated column with having, but it is not part of the select list or the group by clause, the query produces results similar to the "extended" column extension described earlier in this chapter. For example:

select type, avg(price)
from titles
group by type
having total_sales > 4000

type
------------  ----------
business           13.73
business           13.73
business           13.73
mod_cook           11.49
popular_comp       21.48
popular_comp       21.48
psychology         13.50
trad_cook          15.96
trad_cook          15.96
 
(9 rows affected)

Unlike an extended column, the total_sales column does not appear in the final results, yet the number of displayed rows for each type depends on the total_sales for each title. The query indicates that three business, one mod_cook, two popular_comp, one psychology, and two trad_cook, titles exceed $4000 in total sales.
As mentioned earlier, the way Adaptive Server handles extended columns may seem as if the query is ignoring the where clause in the final results. To make the where conditions affect the results for the extended column, you should repeat the conditions in the having clause. For example:

select type, advance, avg(price) 
from titles 
where advance > 5000
group by type
having advance > 5000

type           advance
-------------  ---------  --------
business       10,125.00      2.99
mod_cook       15,000.00      2.99
popular_comp    7,000.00     21.48
popular_comp    8,000.00     21.48
psychology      7,000.00     14.30
psychology      6,000.00     14.30
trad_cook       7,000.00     17.97
trad_cook       8,000.00     17.97
 
(8 rows affected)

Using having without group by

A query with a having clause should also have a group by clause. If you omit group by, all the rows not excluded by the where clause return as a single group.
Because no grouping is done between the where and having clauses, they cannot act independently of each other. having acts like where because it affects the rows in a single group rather than groups, except the having clause can still use aggregates.
This example uses the having clause in the following way: it averages the price, excludes from the results titles with advances greater than $4,000, and produces results where price is less than the average price:

select title_id, advance, price
from titles
where advance < 4000
having price > avg(price)

title_id        advance    price
-------------  ---------  --------
BU1032          5,000.00     19.99
BU7832          5,000.00     19.99
MC2222              0.00     19.99
PC1035          7,000.00     22.95
PC8888          8,000.00     20.00
PS1372          7,000.00     21.59
PS3333          2,000.00     19.99
TC3218          7,000.00     20.95
 
(8 rows affected)

You can also use the having clause with the Transact-SQL extension that allows you to omit the group by clause from a query that includes an aggregate in its select list. These scalar aggregate functions calculate values for the table as a single group, not for groups within the table.
In this example, the group by clause is omitted, which makes the aggregate function calculate a value for the whole table. The having clause excludes non-matching rows from the result group.

select pub_id, count(pub_id)
from publishers
having pub_id < "1000"

pub_id
------ ----------------
0736                  3
0877                  3
 
(2 rows affected)

Group by (SQL)

The group by clause divides the output of a query into groups. You can group by one or more column names, or by the results of computed columns using numeric datatypes in an expression. When used with aggregates, group by retrieves the calculations in each subgroup, and may return multiple rows. The maximum number of columns or expressions you can use in a group by clause is 16.
You cannot group by columns of text or image datatypes.
While you can use group by without aggregates, such a construction has limited functionality and may produce confusing results. The following example groups the results by title type:

select type, advance
from titles
group by type

type             advance
------------     ---------
business          5,000.00
business          5,000.00
business         10,125.00
business          5,000.00
mod_cook              0.00
mod_cook         15,000.00
UNDECIDED             NULL
popular_comp      7,000.00
popular_comp      8,000.00
popular_comp          NULL
psychology        7,000.00
psychology        2,275.00
psychology        6,000.00
psychology        2,000.00
psychology        4,000.00
trad_cook         7,000.00
trad_cook         4,000.00
trad_cook         8,000.00
 
(18 rows affected)

With an aggregate for the advance column, the query returns the sum for each group:

select type, sum(advance)
from titles
group by type

type                                  
------------ ------------------------ 
UNDECIDED                        NULL 
business                    25,125.00 
mod_cook                    15,000.00 
popular_comp                15,000.00 
psychology                  21,275.00 
trad_cook                   19,000.00 

(6 rows affected)

The summary values in a group by clause using aggregates are called vector aggregates, as opposed to scalar aggregates, which result when only one row is returned (see "Using aggregate functions").

group by syntax

The complete syntax of the select statement is repeated here so that you can see the group by clause in context:

select [all | distinct] select_list 
     [into [[database.]owner.]table_name] 
     [from [[database.]owner.]{view_name | table_name          [(index {index_name | table_name} 
               [parallel [degree_of_parallelism]]
               [prefetch size ] [lru | mru])]}
          [holdlock | noholdlock] [shared]
     [,[[database.]owner.]{view_name | table_name          [(index {index_name | table_name} 
               [parallel [degree_of_parallelism]]
               [prefetch size] [lru | mru])]}
          [holdlock | noholdlock] [shared]]...] 
 
     [where search_conditions] 
 
     [group by [all] aggregate_free_expression          [, aggregate_free_expression]...]
     [having search_conditions]
 
     [order by 
     {[[[database.]owner.] {table_name. | view_name.}]
          column_name | select_list_number | expression} [asc | desc]
     [, {[[[database.]owner.]{table_name|view_name.}]          column_name | select_list_number | expression} 
               [asc | desc]]...]
 
     [compute row_aggregate (column_name)
               [, row_aggregate(column_name)]...
          [by column_name [, column_name]...]] 
 
     [for {read only | update [of column_name_list]}]
 
     [at isolation {read uncommitted | read committed | serializable}]
 
     [for browse]

Remember that the order of the clauses in the select statement is significant. You can omit any of the optional clauses, but when you use them, they must appear in the order shown above.
group by and SQL standardsThe SQL standards for group by are more restrictive than Sybase's standard. The SQL standard requires that:

The columns in a select list must be in the group by expression or they must be arguments of aggregate functions.
A group by expression can only contain column names in the select list, but not those used only as arguments for vector aggregates.

Several Transact-SQL extensions (described in the following sections) relax these restrictions, however the more complex result sets may be more difficult to understand. If you set the fipsflagger option as follows, you will receive a warning message stating that Transact-SQL extensions are used:

set fipsflagger on

For more information about the fipsflagger option, see the set command in the Reference Manual.
Nesting groups with group byYou can list more than one column in the group by clause to nest groups. Once the sets are established with group by, the aggregates are applied. This statement finds the average price and the sum of book sales, grouped first by publisher identification number and then by type:

select pub_id, type, avg(price), sum(total_sales)
from titles
group by pub_id, type

pub_id  type
------  ------------  ------  ------- 
0736    business       2.99  18,722 
0736    psychology    11.48   9,564 
0877    UNDECIDED      NULL    NULL 
0877    mod_cook      11.49  24,278 
0877    psychology    21.59     375 
0877    trad_cook     15.96  19,566 
1389    business      17.31  12,066 
1389    popular_comp  21.48  12,875 
 
(8 rows affected)

You can nest many groups within groups, up to the maximum of 16 columns or expressions specified with group by.

Referencing other columns in queries using group by

SQL standards state that the group by must contain items from the select list. However, Transact-SQL allows you to specify any valid column name in either the group by or select list, whether they employ aggregates or not.
Through the following extensions, Sybase lifts restrictions on what you can include or omit in the select list of a query that includes group by.

The columns in the select list are not limited to the grouping columns and columns used with the vector aggregates.
The columns specified by group by are not limited to those non-aggregate columns in the select list.

A vector aggregate must be accompanied by a group by clause. The SQL standards require that the non-aggregate columns in the select list match the group by columns. However, the first bulleted item described above allows you to specify additional, extended columns in the select list of the query.
For example, many versions of SQL do not allow the inclusion of the extended title_id column in the select list, but it is legal in Transact-SQL:

select type, title_id, avg(price), avg(advance) 
from titles 
group by type

type           title_id 
------------   --------    -----   ------- 
business       BU1032      13.73  6,281.25 
business       BU1111      13.73  6,281.25 
business       BU2075      13.73  6,281.25 
business       BU7832      13.73  6,281.25 
mod_cook       MC2222      11.49  7,500.00 
mod_cook       MC3021      11.49  7,500.00 
UNDECIDED      MC3026      NULL       NULL 
popular_comp   PC1035      21.48  7,500.00 
popular_comp   PC8888      21.48  7,500.00 
popular_comp   PC9999      21.48  7,500.00 
psychology     PS1372      13.50  4,255.00 
psychology     PS2091      13.50  4,255.00 
psychology     PS2106      13.50  4,255.00 
psychology     PS3333      13.50  4,255.00 
psychology     PS7777      13.50  4,255.00 
trad_cook      TC3218      15.96  6,333.33 
trad_cook      TC4203      15.96  6,333.33 
trad_cook      TC7777      15.96  6,333.33 
 
(18 rows affected)

The above example still aggregates the price and advance columns based on the type column, but its results also display the title_id for the books included in each group.
The second extension described above allows you to group columns that are not specified as columns in the select list of the query. These columns do not appear in the results, but the vector aggregates still compute their summary values. For example:

select state, count(au_id)
from authors
group by state, city

state
-----  --------
CA            2
CA            1
CA            5
CA            5
CA            2
CA            1
CA            1
CA            1
CA            1
IN            1
KS            1
MD            1
MI            1
OR            1
TN            1
UT            2
 
(16 rows affected)

This example groups the vector aggregate results by both state and city, even though it does not display which city belongs to each group. Therefore, results are potentially misleading.
You may think the following query should produce similar results to the previous query, since only the vector aggregate seems to tally the number of each city for each row:

select state, count(au_id)
from authors
group by city

However, its results are much different. By not using group by with both the state and city columns, the query tallies the number of each city, but it displays the tally for each row of that city in authors rather than grouping them into one result row per city.

state
 ----- -----------

CA              1 
 CA              5 
 CA              2 
 CA              1 
 CA              5 
 KS              1 
 CA              2 
 CA              2 
 CA              1 
 CA              1 
 TN              1 
 OR              1 
 CA              1 
 MI              1 
 IN              1 
 CA              5 
 CA              5 
 CA              5 
 MD              1 
 CA              2 
 CA              1 
 UT              2 
 UT              2 

(23 rows affected)

When you use the Transact-SQL extensions in complex queries that include the where clause or joins, the results may become even more difficult to understand. To avoid confusing or misleading results with group by, Sybase suggests that you use the fipsflagger option to identify queries that use Transact-SQL extensions. See "group by and SQL standards" for details.
For more information about Transact-SQL extensions to group by and how they work, see the Reference Manual.

Expressions and group by

Another Transact-SQL extension allows you to group by an expression that does not include aggregate functions. For example:

select avg(total_sales), total_sales * price 
from titles 
group by total_sales * price

---------      -------------
     NULL               NULL
      111             777.00  
      375           7,856.25  
      375           8,096.25 
     2045          22,392.75  
     3336          26,654.64  
     2032          40,619.68  
     3876          46,318.20  
    18722          55,978.78  
     4095          61,384.05  
    22246          66,515.54  
     4072          81,399.28  
     4095          81,859.05  
     4095          81,900.00  
    15096         180,397.20  
     8780         201,501.00 
 
(16 rows affected)

The expression "total_sales * price" is allowed.
You cannot group by a column heading, also known as an alias, although you can still use one in your select list. This statement produces an error message:

select Category = type, title_id, avg(price), avg(advance) 
from titles 
group by Category

The group by clause should be "group by type", not "group by Category".

select Category = type, title_id, avg(price), avg(advance) 
from titles 
group by type

Nesting aggregates with group by

Another Transact-SQL extension allows you to nest a vector aggregate inside a scalar aggregate. For example, to find the average price of all types of books using a non-nested aggregate, enter:

select avg(price) 
from titles 
group by type

--------------- 
 NULL 
13.73 
11.49 
21.48 
13.50 
15.96 
 
(6 rows affected)

Nesting the average price inside the max function produces the highest average price of a group of books, grouped by type:

select max(avg(price)) 
from titles 
group by type

------------- 
        21.48 
 
(1 row affected)

By definition, the group by clause applies to the innermost aggregate--in this case, avg.

Null values and group by

If the grouping column contains a null value, that row becomes its own group in the results. If the grouping column contains more than one null value, the null values form a single group. Here is an example that uses group by and the advance column, which contains some null values:

select advance, avg(price * 2) 
from titles 
group by advance

advance                                            
------------------  -----------------  
             NULL               NULL  
             0.00              39.98  
          2000.00              39.98  
          2275.00              21.90  
          4000.00              19.94  
          5000.00              34.62  
          6000.00              14.00  
          7000.00              43.66  
          8000.00              34.99  
         10125.00               5.98  
         15000.00               5.98  
 
(11 rows affected)

If you are using the count(column_name) aggregate function, grouping by a column that contains null values will return a count of zero for the grouping row, since count(column_name) does not count null values. In most cases, you should use count(*) instead. This example groups and counts on the price column from the titles table, which contains null values, and shows count(*) for comparison:

select price, count(price), count(*) 
from titles 
group by price

price   
------------- ----- ----- 
        NULL     0     2 
        2.99     2     2 
        7.00     1     1 
        7.99     1     1 
       10.95     1     1 
       11.95     2     2 
       14.99     1     1 
       19.99     4     4 
       20.00     1     1 
       20.95     1     1 
       21.59     1     1 
       22.95     1     1 
 
(12 rows affected)

where clause and group by

You can use a where clause in a statement with group by. Rows that do not satisfy the conditions in the where clause are eliminated before any grouping is done. Here is an example:

select type, avg(price)
from titles
where advance > 5000
group by type

type 
-------------    -------- 
business            2.99 
mod_cook            2.99 
popular_comp       21.48 
psychology         14.30 
trad_cook          17.97 
 
(5 rows affected)

Only the rows with advances of more than $5000 are included in the groups that are used to produce the query results.
However, the way that Adaptive Server handles extra columns in the select list and the where clause may seem contradictory. For example:

select type, advance, avg(price) 
from titles 
where advance > 5000
group by type

type           advance
-------------  ---------  --------
business        5,000.00      2.99
business        5,000.00      2.99
business       10,125.00      2.99
business        5,000.00      2.99
mod_cook            0.00      2.99
mod_cook       15,000.00      2.99
popular_comp    7,000.00     21.48
popular_comp    8,000.00     21.48
popular_comp        NULL     21.48
psychology      7,000.00     14.30
psychology      2,275.00     14.30
psychology      6,000.00     14.30
psychology      2,000.00     14.30
psychology      4,000.00     14.30
trad_cook       7,000.00     17.97
trad_cook       4,000.00     17.97
trad_cook       8,000.00     17.97
 
(17 rows affected)

It only seems as if the query is ignoring the where clause when you look at the results for the advance (extended) column. Adaptive Server still computes the vector aggregate using only those rows that satisfy the where clause, but it also displays all rows for any extended columns that you include in the select list. To further restrict these rows from the results, you must use a having clause (described later in this chapter).

group by and all

The keyword all in the group by clause is a Transact-SQL enhancement. It is meaningful only if the select statement in which it is used also includes a where clause.
If you use all, the query results include all the groups produced by the group by clause, even if some groups do not have any rows that meet the search conditions. Without all, a select statement that includes group by does not show groups for which no rows qualify.
Here is an example:

select type, avg(advance)  
from titles  
where advance > 1000 and advance < 10000 
group by type

type                                    
------------  ------------------------   
business                      5,000.00   
popular_comp                  7,500.00   
psychology                    4,255.00   
trad_cook                     6,333.33   
 
(4 rows affected)

select type, avg(advance)  
from titles  
where advance > 1000 and advance < 10000
group by all type

type                                    
------------  ------------------------   
UNDECIDED                         NULL   
business                      5,000.00   
mod_cook                          NULL   
popular_comp                  7,500.00   
psychology                    4,255.00   
trad_cook                     6,333.33   
 
(6 rows affected)

The first statement produces groups only for those books that commanded advances of more than $1000 but less than $10,000. Since no modern cooking books have an advance within that range, there is no group in the results for the mod_cook type.
The second statement produces groups for all types, including modern cooking and "UNDECIDED," even though the modern cooking group does not include any rows that meet the qualification specified in the where clause. Adaptive Server returns a NULL result for all groups that lack qualifying rows.

Using aggregates without group by

By definition, scalar aggregates apply to all rows in a table, producing a single value for the whole table for each function. The Transact-SQL extension that allows you to include extended columns with vector aggregates also allows you to include extended columns with scalar aggregates. For example, look at the publishers table:

pub_id pub_name             city           state

------ ------------------   -------------- ----- 
0736   New Age Books        Boston         MA    
0877   Binnet & Hardley     Washington     DC    
1389   Algodata Infosystems Berkeley       CA

It contains three rows. The following query produces a three-row scalar aggregate based on each row of the table:

select pub_id, count(pub_id)
from publishers

pub_id 
---------- ---------
0736               3
0877               3
1389               3
 
(3 rows affected)

Because Adaptive Server treats publishers as a single group, the scalar aggregate applies to the (single-group) table. The results display every row of the table for each column you include in the select list, in addition to the scalar aggregate.
The where clause behaves the same way for scalar aggregates as with vector aggregates. The where clause restricts the columns included in the aggregate summary values, but it does not affect the rows that appear in the results for each extended column you specify in the select list. For example:

select pub_id, count(pub_id) 
from publishers 
where pub_id < "1000"

pub_id 
-------------- ----------- 
0736                    2 
0877                    2 
1389                    2 
 
(3 rows affected)

Like the other Transact-SQL extensions to group by, this extension provides results that may be difficult to understand, especially for queries on large tables or queries with multitable joins.

Operators in t-sql

Operators have the following precedence levels, where 1 is the highest level and 6 is the lowest:

unary (single argument) - + ~
* /%
binary (two argument) + - & | ^
not
and
or

Bitwise operators
So we have bitwise operators like we have in c .. these are & , | , ^ and ~ operators.

The string operator + can concatenate two or more character or binary expressions. For example:

```
select Name = (au_lname + ",
" + au_fname) 
from authors
```
Displays author names under the column heading "Name" in last-name, first-name order, with a comma after the last name; for example, "Bennett, Abraham."
```
select "abc" + "" + "def" 
```
Returns the string "abc def". The empty string is interpreted as a single space in all char, varchar, nchar, nvarchar, and text concatenation, and in varchar insert and assignment statements.

When concatenating non-character, non-binary expressions, use convert:

select "The date is " + 
    convert(varchar(12), getdate())

The comparison operators
Adaptive Server uses these comparison operators:

Comparison operators
Operator	Meaning
=	Equal to
>	Greater than
<	Less than
>=	Greater than or equal to
<=	Less than or equal to
<>	Not equal to
!=	Not equal to (Transact-SQL extension)
!>	Not greater than (Transact-SQL extension)
!<	Not less than (Transact-SQL extension)

In comparing character data, < means closer to the beginning of the server's sort order and > means closer to the end of the sort order. Uppercase and lowercase letters are equal in a sort order that is not case sensitive. Use sp_helpsort to see the sort order for your Adaptive Server. Trailing blanks are ignored for comparison purposes.
In comparing dates, < means earlier than and > means later than.
Put single or double quotes around all character and datetime data used with a comparison operator:

= "Bennet" 
> "May 22 1947"

Nonstandard operatorsThe following operators are Transact-SQL extensions:

Modulo operator: %
Negative comparison operators: !>, !<, !=
Bitwise operators: ~, ^, |, &
Join operators: *= and =*

Comparing character expressionsAdaptive Server treats character constant expressions as varchar. If they are compared with non-varchar variables or column data, the datatype precedence rules are used in the comparison (that is, the datatype with lower precedence is converted to the datatype with higher precedence). If implicit datatype conversion is not supported, you must use the convert function (see "Supported conversions").
Comparison of a char expression to a varchar expression follows the datatype precedence rule; the "lower" datatype is converted to the "higher" datatype. All varchar expressions are converted to char (that is, trailing blanks are appended) for the comparison.
Using the empty stringThe empty string ("") or ('') is interpreted as a single blank in insert or assignment statements on varchar data. When varchar, char, nchar, or nvarchar data is concatenated, the empty string is interpreted as a single space. For example, this is stored as "abc def":

"abc" + "" + "def"

The empty string is never evaluated as NULL.
Including quotation marks in character expressionsThere are two ways to specify literal quotes within a char or varchar entry. The first method is to use an additional quote with a quote of the same type. This is called "escaping" the quote. For example, if you begin a character entry with a single quote, but you want to include a single quote as part of the entry, use two single quotes:

'I don't understand.'

Here is an example containing internal double and single quotes. The single quote does not have to be escaped, but the double quote does:

"He said, ""It's not really confusing."""

The second method is to enclose a quote in the opposite kind of quotation mark. In other words, surround an entry containing a double quote with single quotes (or vice versa). Here are some examples:

'George said, "There must be a better way."' 
"Isn't there a better way?" 
'George asked, "Isn"t there a better way?"'

To continue a character string that would go off the end of one line on your screen, enter a backslash (\) before going to the following line.
If the quoted_identifier option is set to on, do not use double quotes around character or date data. You must use single quotes, or Adaptive Server treats the data as an identifier. For more information about quoted identifiers, see "Delimited identifiers".

Relational and logical expressions

A logical expression or relational expression returns TRUE, FALSE, or UNKNOWN. The general patterns are:

expression comparison_operator [any | all] expression

expression [not] in expression

[not] exists expression

expression [not] between expression and expression

expression [not] like "match_string" [escape "escape_character"]

not expression like "match_string" [escape "escape_character"]

expression is [not] null

not logical_expression

logical_expression {and | or} logical_expression

Using any, all, and inany is used with <, >, or = and a subquery. It returns results when any value retrieved in the subquery matches the value in the where or having clause of the outer statement. all is used with < or > and a subquery. It returns results when all values retrieved in the subquery are less than (<) or greater than (>) the value in the where or having clause of the outer statement. See Chapter 5, "Subqueries: Using Queries Within Other Queries," for more information.
in returns results when any value returned by the second expression matches the value in the first expression. The second expression must be a subquery or a list of values enclosed in parentheses. in is equivalent to = any.
Connecting expressions with and and orand connects two expressions and returns results when both are true. or connects two or more conditions and returns results when either of the conditions is true.
When more than one logical operator is used in a statement, and is evaluated before or. Use parentheses to change the order of execution.
Table 1-6 shows the results of logical operations, including those that involve null values:

Truth tables for logical expressions
and	TRUE	FALSE	NULL
TRUE	TRUE	FALSE	UNKNOWN
FALSE	FALSE	FALSE	FALSE
NULL	UNKNOWN	FALSE	UNKNOWN

or	TRUE	FALSE	NULL
TRUE	TRUE	TRUE	TRUE
FALSE	TRUE	FALSE	UNKNOWN
NULL	TRUE	UNKNOWN	UNKNOWN

not
TRUE	FALSE
FALSE	TRUE
NULL	UNKNOWN

The result UNKNOWN indicates that one or more of the expressions evaluates to NULL, and that the result of the operation cannot be determined to be either TRUE or FALSE.

Getting information about datatypes and tables

Use sp_help to display information about the properties of a system datatype or a user-defined datatype. The report indicates the base type from which the datatype was created, whether it allows nulls, the names of any rules and defaults bound to the datatype, and whether it has the IDENTITY property.
The following examples display the information about the money system datatype and the tid user-defined datatype:

sp_help money

Type_name  Storage_type Length Prec  Scale  
---------- ------------ ------ ----- -----
money      money             8 NULL  NULL

Nulls   Default_name  Rule_name   Identity
-----   ------------  ---------   -------- 
    1   NULL          NULL            NULL

(return status = 0)

sp_help tid

Type_name  Storage_type Length Prec  Scale  
---------- ------------ ------ ----- -----
tid        varchar           6 NULL  NULL

Nulls   Default_name  Rule_name   Identity
-----   ------------  ---------   -------- 
    0   NULL          NULL               0