Query help on a self join for finding similar items

devdude Peon

Messages:: 2

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#1

Hello everyone.. this is my first post.. I hope I don't do/say something stupid.

I have a database which contains tables: item, tag, and item2tag, like so:

item
- id
- description (varchar)
- popularity (integer)

tag
- id
- name (varchar)

item2tag
- id
- item_id
- tag_id
- weight (float)

The weights of an item's tags will add up to 1. For example, if an item has 5 entries in item2tag, the sum of their weights will be 1.

Given an item.id, I'd like to find the 50 most similar items to it based on tags and their weights. I've gotten a query to work, but as the database grows, it's getting slower and slower:

SELECT matcheditem.*,
SUM(LEAST(matchedtags.weight, mytags.weight)) as similarity
FROM
item2tag as mytags,
item as matcheditem,
item2tag as matchedtags
WHERE
mytags.item_id=12345
AND mytags.tag_id=matchedtags.tag_id
AND matchedtags.item_id=matcheditem.id
GROUP BY matchedtags.item_id
ORDER BY similarity DESC
LIMIT 50

This query is taking a few seconds now, and is getting worse and worse as more items are added. All id and _id fields are indexed. There are about 40,000 items, and item2tag contains about 1,000,000 rows.

I'm willing to run a larger query(s) or procedure that will create a table with 50 results for each entry, but would prefer a faster query.

Any thoughts?

devdude, Jul 29, 2008 IP

jestep Prominent Member

Messages:: 3,659

Likes Received:: 215

Best Answers:: 19

Trophy Points:: 330

#2

Make sure you have indexes on these fields:
item2tag
- id
- item_id (... index)
- tag_id (... index)
- weight (float)

It will take a while to add an index to a million row table, but this should speed up the query you provided.

jestep, Jul 30, 2008 IP

devdude Peon

Messages:: 2

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#3

I have indexes on those fields, queries are still taking 100 seconds!

Would doing a sub-select help at all?

devdude, Jul 30, 2008 IP

Log in or Sign up

Query help on a self join for finding similar items

devdude Peon

jestep Prominent Member

devdude Peon

Useful Searches