Thursday, March 8, 2012

"second newest" record?

I've seen a number of solution to get the "newest record" from a time series
-- and extensively benchmarked them on our 2 million+ row prices database. In
a similar task I'm trying to get the "right" record from a series of
transactions that include cancel/corrections.
For instance, let's say the operator fat-fingers an order for IBM and gets
ten times too much. We'll notice the error, and they'll issue a
cancel/correct, like this...
DAY 1: buy 50000 IBM #1028A
DAY 2: buy -50000 IBM #1028A
buy 5000 IBM #1028A
So in this case they cancel the original mistake, then send out the
correction. When I import these into our DB I give them row numbers and
timestamp them. In order to get the "right record" I subselect the maximum id
for all orders with the same ID...
SELECT *
FROM import
INNER JOIN
(SELECT MAX(id) AS MAXID
FROM import
GROUP BY OrderNumber) o
ON id= o.MAXID
Ok, so now what if they actually report it this way instead...
DAY 1: buy 50000 IBM #1028A
DAY 2: buy 5000 IBM #1028A
buy -50000 IBM #1028A
Yes, that's right, they don't report cancel/correct, but correct/cancel. Grrr.
So how do I adjust my SQL to get the "second most maximum ID"? I can't
figure this out.
Maury
Maury,
Aren't you saying that for some orders the order you want is the last,
and for others it's the next-to-last? Selecting the second-largest ID
for every order doesn't sound likely to solve your problem. While you
can do that, is there another way you can describe the row you want,
such as the most recent row with a positive number of shares, or the
result of adding up all shares in an order?
Anyway, you can get the second largest id for each OrderNumber like this
(untested):
select * from import as I1
where id = (
select top 1 T.id
from (
select top 2 I2.id
from import as I2
where I2.OrderNumber = I1.OrderNumber
order by id desc
) T
order by id
)
Steve Kass
Drew University
Maury Markowitz wrote:

>I've seen a number of solution to get the "newest record" from a time series
>-- and extensively benchmarked them on our 2 million+ row prices database. In
>a similar task I'm trying to get the "right" record from a series of
>transactions that include cancel/corrections.
>For instance, let's say the operator fat-fingers an order for IBM and gets
>ten times too much. We'll notice the error, and they'll issue a
>cancel/correct, like this...
>DAY 1: buy 50000 IBM #1028A
>DAY 2: buy -50000 IBM #1028A
> buy 5000 IBM #1028A
>So in this case they cancel the original mistake, then send out the
>correction. When I import these into our DB I give them row numbers and
>timestamp them. In order to get the "right record" I subselect the maximum id
>for all orders with the same ID...
>SELECT *
>FROM import
>INNER JOIN
> (SELECT MAX(id) AS MAXID
> FROM import
> GROUP BY OrderNumber) o
>ON id= o.MAXID
>Ok, so now what if they actually report it this way instead...
>DAY 1: buy 50000 IBM #1028A
>DAY 2: buy 5000 IBM #1028A
> buy -50000 IBM #1028A
>Yes, that's right, they don't report cancel/correct, but correct/cancel. Grrr.
>So how do I adjust my SQL to get the "second most maximum ID"? I can't
>figure this out.
>Maury
>
|||"Steve Kass" wrote:
> Aren't you saying that for some orders the order you want is the last,
> and for others it's the next-to-last?
No, for some brokers (ie, the smart ones) it's the last record, but in this
case the "correct" record will ALWAYS be the second-to-last.

> can do that, is there another way you can describe the row you want,
> such as the most recent row with a positive number of shares, or the
> result of adding up all shares in an order?
I considered the last idea, but it only works for quantity. Other changes,
like the security name or price, can't be added up.
I'm going to try your SQL suggestion now!
Maury
|||"Steve Kass" wrote:
Your SQL worked great Steve. Sadly your other comment turned out to be true:
they DO sometimes put the correction as the second record, and sometimes the
third. I've looked through the data for some sort of determinant, but I can't
seem to find it. It might be possible to compare the side (buy/sell) with the
quantity or something, but that seems pretty nasty too.
|||On Thu, 6 Jan 2005 11:21:02 -0800, Maury Markowitz wrote:

>"Steve Kass" wrote:
>Your SQL worked great Steve. Sadly your other comment turned out to be true:
>they DO sometimes put the correction as the second record, and sometimes the
>third. I've looked through the data for some sort of determinant, but I can't
>seem to find it. It might be possible to compare the side (buy/sell) with the
>quantity or something, but that seems pretty nasty too.
Hi Maury,
Sorry to hear about the mess you're finding yourself in. I don't think I
can help you sort this out (at least not based on the info you've posted
so far), but once you have this nder control, I suggest you prevent this
from happenning again by adding one column:
ALTER TABLE import
ADD COLUMN correction_to INT
DEFAULT NULL
REFERENCES import(ID)
(Change the datatype from INT to the datatype of your import.ID column)
Best, Hugo
(Remove _NO_ and _SPAM_ to get my e-mail address)

No comments:

Post a Comment