Monday, March 19, 2012

"Users shopping for A were also interested in B"

How should I set up a database to be able to efficiently maintain
associations between related items?

Example: Users shopping for Lord Of The Rings trilogy were also
interested in The Hobbit.

There will be many (20000+) items for sale and I need to do this
efficiently, but I don't have an idea how could I do it.

TIAhttp://www.google.co.uk/groups?selm...ooglegroups.com

--
David Portas
SQL Server MVP
--|||Yeah but how to create these associations efficiently in the first
place?|||This called a relational division, and you can Google it. You want one
with a remainder for doing "shopping baskets".

Having said this, you are better off looking for a data mining tool
that will have the ability to set up the baskets on the fly for
somethign like Amazon.

Assuming you only want to have pairs in the baskets and do this only
once and not on the fly, you can set up a sparse matrix in SQL:

CREATE TABLE Pairs
(item_1_upc CHAR(13) NOT NULL,
item_2_upc CHAR(13) NOT NULL,
tally INTEGER DEFAULT 0 NOT NULL,
CHECK (item_1_upc < item_2_upc) ,
PRIMARY KEY (item_1_upc, item_2_upc));

Now go thru every order, pull out all possible pairs that occur more
than some threshold:

INSERT INTO Pairs(item_1_upc, item_2_upc, tally)
SELECT O1.upc, O2.upc, COUNT(*)
FROM Orders AS O1, Orders AS O2
WHERE O1.order_nbr = O2.order_nbr
AND O1.upc < O2.upc
GROUP BY O1.upc, O2.upc
HAVING COUNT(*) > @.some_limit;

>From the Pairs matrix, you can then build Triplets with whatever rules
you wish based on the tally. For example, given (a, b, 123) and
(b,c, 100) you know that (a,b,c, ??) cannot have more than 100
occurences, but if I have 1000 b's, the number could be as low as zero.|||The first query in the reply I posted does that ("efficiently" is moot
I suppose). If you need something different then please post DDL and
sample data so that we can understand what you need.

In reality, I think most e-tailers would not want to automate the
link-selling process. I suspect the results would be unpredictable and
I doubt that the major sites ever do it that way. Careful review and
selection of the linked products by the sales and marketing team will
probably give a better return.

--
David Portas
SQL Server MVP
--

No comments:

Post a Comment