Saturday, February 25, 2012

"Max row" in each group

I'm having trouble developing a query that will group on a given set of
columns and, for each group, will also display the other columns in the
"max row" for that group. Let me illustrate what I mean:
CREATE TABLE [dbo].[GroupTest] (
[testID] [int] IDENTITY (1, 1) NOT NULL ,
[office] [char] (3) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
[startDate] [datetime] NOT NULL ,
[status] [int] NOT NULL ,
[endDate] [datetime] NULL ,
[reportDate] [datetime] NOT NULL ,
[amount] [int] NULL
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[GroupTest] ADD
CONSTRAINT [PK_GroupTest] PRIMARY KEY CLUSTERED
(
[testID]
) ON [PRIMARY]
GO
Each row in this table represents a status update in an issue tracking
system. (The actual table has more columns. This is a subset for
illustrative purposes.) None of the columns is unique other than testID.
I am being asked to produce a query for each combination of {office,
startDate} will include the row matching that combination that has the
greatest value in the status column. So if office='ABC' and
startDate='2005-11-30' and there are two rows matching these values, one
with status=3 and one with status=4, the query should include the row
having status=4.
If the combination {office, startDate, status} were unique in this
table, the following query would do the job:
SELECT
G0.office, G0.startDate, G0.status,
G0.endDate, G0.reportDate, G0.amount
FROM GroupTest G0
INNER JOIN
(SELECT office, startDate, max(status) maxStatus
FROM GroupTest GROUP BY office, startDate) G1
ON G0.office = G1.office
AND G0.startDate = G1.startDate
AND G0.status = G1.maxStatus
order by G0.office, G0.startDate
But that's not the case. I asked the customer what to do. For their
purposes, the report will be fine if it can show them an "illustrative"
row--in other words, if more than one row in the table matches a given
combination {office, startDate, status}, the query should return one of
them arbitrarily. But I'm not figuring out how to do that. I guess I'd
like to figure out how to retrieve the "maximum row" for each {office,
startDate} combination, or something similar. Can anyone help?
I could punt and go with just displaying the maximum over each of the
columns on the right-hand side (status, endDate, reportDate, amount),
each taken independently:
SELECT
G0.office, G0.startDate, G0.status,
G0.endDate, G0.reportDate, G0.amount
FROM GroupTest G0
INNER JOIN
(SELECT office, startDate, max(status) maxStatus,
max(endDate) maxEndDate,
max(reportDate) maxReportDate,
max(amount) maxAmount
FROM GroupTest GROUP BY office, startDate) G1
ON G0.office = G1.office
AND G0.startDate = G1.startDate
AND G0.status = G1.maxStatus
AND G0.endDate = G1.maxEndDate
AND G0.reportDate = G1.maxReportDate
AND G0.amount = G1.maxAmount
order by G0.office, G0.startDate
But the rows produced by that query aren't "coherent": For a given
{office, startDate} combination, the status could be from one row, the
startDate could be from another, the endDate from another, etc. Each row
in my query should match an actual row in the table.I've moved on from my previous question, having figured out an approach.
However, I'm stuck on that approach because I'm getting unexpected
results. I'm asking about that in a new thread.

No comments:

Post a Comment