Monday, May 21, 2007

How do I return row numbers with my query?

Often, people want to "invent" an identity, or rank, on the fly. So their original result set would look like this:

Lastname Firstname
-------- ---------
Evans Bob
Smith Frank

And they would want this:

Rownum Lastname Firstname
------ -------- ---------
1 Evans Bob
2 Smith Frank

This would act like Oracle's ROWNUM, which isn't supported in SQL Server.

Of course, once you've retrieved this resultset into your ASP page, you could use a counter to increment as you're processing. This is by the easiest way, e.g.

<%
' ...
set rs = conn.execute(sql)
counter = 0
do while not rs.eof
counter = counter + 1
response.write counter & " "
response.write rs(0) & "
"
rs.movenext
loop
' ...
%>

However, some people really, really, really want the row number to come back from the database. It's a little less efficient, but let's examine a few methods. Given this sample data:

SET NOCOUNT ON

CREATE TABLE people
(
firstName VARCHAR(32),
lastName VARCHAR(32)
)
GO

INSERT people VALUES('Aaron', 'Bertrand')
INSERT people VALUES('Andy', 'Roddick')
INSERT people VALUES('Steve', 'Yzerman')
INSERT people VALUES('Steve', 'Vai')
INSERT people VALUES('Joe', 'Schmoe')

The first method we'll try is a COUNT with a GROUP BY:

SELECT
rank = COUNT(*),
a.firstName,
a.lastName
FROM
people a
INNER JOIN people b
ON
a.lastname > b.lastname
OR
(
a.lastName = b.lastName
AND
a.firstName >= b.firstName
)
GROUP BY
a.firstName,
a.lastName
ORDER BY
rank

We can also try a COUNT as a subquery, which doesn't require GROUP BY (which means you could include other columns in the outer query).

SELECT
rank = (
SELECT COUNT(*)
FROM people b
WHERE
a.lastname > b.lastname
OR
(
a.lastName = b.lastName
AND a.firstName >= b.firstName
)
),
a.firstName,
a.lastName
FROM
people a
ORDER BY
a.firstName,
a.lastName

Results in all cases:

rank firstName lastName
---- --------- --------
1 Aaron Bertrand
2 Andy Roddick
3 Joe Schmoe
4 Steve Vai
5 Steve Yzerman

Note that if you have duplicates in your table, you will end up with something like this:

1 Aaron Bertrand
3 Joe Schmoe
3 Joe Schmoe

So, to avoid this, you might want to make sure that either (a) you avoid and remove duplicates (see Article #2431); or (b) if duplicates are allowed and make sense for your data model, that you have some other primary key or unique identifier. Then, you can make it a part of the query; for example:

SET NOCOUNT ON

CREATE TABLE people
(
peopleID INT IDENTITY(1,1) PRIMARY KEY,
firstName VARCHAR(32),
lastName VARCHAR(32)
)
GO

INSERT people VALUES('Aaron', 'Bertrand')
INSERT people VALUES('Andy', 'Roddick')
INSERT people VALUES('Steve', 'Yzerman')
INSERT people VALUES('Steve', 'Yzerman')
INSERT people VALUES('Steve', 'Vai')
INSERT people VALUES('Joe', 'Schmoe')

SELECT
rank = (
SELECT COUNT(*)
FROM people b
WHERE a.lastName > b.lastName
OR
(
a.lastname = b.lastname
AND a.firstName >= b.firstName
)
) - (
SELECT COUNT(*) FROM
people b
WHERE a.lastName = b.lastName
AND a.firstName = b.firstName
AND a.peopleID < b.peopleID
),
a.firstName,
a.lastName
FROM
people a
ORDER BY
a.lastName,
a.firstName

Results:

rank firstName lastName
---- --------- --------
1 Aaron Bertrand
2 Andy Roddick
3 Joe Schmoe
4 Steve Vai
5 Steve Yzerman
6 Steve Yzerman

Grouping within groups

Often, you'll want a more complex row number scheme, for example you might want to rank within groups of a hierarchy. Let's say we wanted to list sports teams, and assign "ranks" alphabetically, within each city:

CREATE TABLE #teams
(
city VARCHAR(20),
team VARCHAR(20)
)

SET NOCOUNT ON

INSERT #teams SELECT 'Boston', 'Celtics'
INSERT #teams SELECT 'Boston', 'Bruins'
INSERT #teams SELECT 'Boston', 'Red Sox'
INSERT #teams SELECT 'New York', 'Yankees'
INSERT #teams SELECT 'New York', 'Mets'
INSERT #teams SELECT 'New York', 'Knicks'
INSERT #teams SELECT 'New York', 'Rangers'
INSERT #teams SELECT 'New York', 'Islanders'
INSERT #teams SELECT 'New York', 'Jets'
INSERT #teams SELECT 'New York', 'Giants'
INSERT #teams SELECT 'Chicago', 'Black Hawks'
INSERT #teams SELECT 'Chicago', 'Cubs'
INSERT #teams SELECT 'Chicago', 'White Sox'
INSERT #teams SELECT 'Chicago', 'Bears'
INSERT #teams SELECT 'New England', 'Patriots'

SELECT city, team, rank =
(
SELECT COUNT(*)
FROM #teams t2
WHERE t2.city = t1.city
AND t2.team <= t1.team
)
FROM #teams t1
ORDER BY city, team

DROP TABLE #teams

Results:

city team rank
------------ ------------ ----
Boston Bruins 1
Boston Celtics 2
Boston Red Sox 3
Chicago Bears 1
Chicago Black Hawks 2
Chicago Cubs 3
Chicago White Sox 4
New England Patriots 1
New York Giants 1
New York Islanders 2
New York Jets 3
New York Knicks 4
New York Mets 5
New York Rangers 6
New York Yankees 7


Keep in mind that, since your presentation tool (Crystal Reports, ASP, PHP, what have you) is going to have to treat every row separately anyway, it makes sense to just retrieve the rows in the correct order, and let the application compare every row to see if this is a new city or not, and accordingly increment the count or start over. This will greatly reduce the amount of strain you're putting on the database.

No comments: