It seems like I should be able to do this in one query, but I can't figure out the syntax. My site is selling books. In this section, I'm setting up pages which list books sorted by keywords. For each of the keywords, I need to make sure that there are at least two active titles tagged with that keyword before that keyword is included in a linked list. So, each of the keywords with at least two active titles are listed. I've got three relevant tables: `books` is the table which has the bulk of the info about each title. Primary key is `isbn`. `remove` keeps track of which titles are no longer available. Fields are `id`(key), `isbn`, and `day2go`. If an entry in `books` does not have an entry in `remove`, that title is available. `topicLU` is a lookup table connecting the `topics` table to `books`. Fields are `id` (key), `isbn`, and `topicid`. For this query, we have a value for topicLU.topicid. Call it $topicid. I want a list of books which share the isbn field with `topicLU` but are not in `remove`. I could do this with two queries and a loop, but it seems like I should be able to do it with a left join. thanks in advance...
select books.title, books.isbn, topicLU.topicid, topic.topic from `books` inner join topicLU on books.isbn = topicLU.isbn inner join topic on topicLU.topicid = topicLU.topicid where not exists (select remove.id from `remove` where remove.isbn = books.isbn) and exists (select tlu2.topicid, count(b2.isbn) as counter from topicLU as tlu2 inner join books as b2 on tlu2.isbn = b2.isbn group by tlu2.topicid where topicLU.topicid = tlu2.topicid order by tlu2.topicid having counter > 2) Code (markup): untested, obviously, and I'm racing out the door. If you want to give us a sql fiddle we'll be able to help you write the query properly and test it. http://sqlfiddle.com/ It takes a bit to set it up but well worth going through the process.
Wow. That's impressive. And sqlfiddle looks like a powerful tool. I'll see if I can get this working. Thanks.
Is the last exists statement necessary? It seems like you could adjust the join statement and then use a where clause in the main query to avoid the second nested query all together. The fewer nested queries, the better. They create a ton of overhead and should be avoided when possible.
I was rushing a bit... I created a fiddle and then the site crashed but this is what I had for the schema CREATE TABLE `books`( `isbn` INT(11) NOT NULL AUTO_INCREMENT, `title` VARCHAR(150), PRIMARY KEY (`isbn`) ); CREATE TABLE `remove`( `id` INT(11) NOT NULL AUTO_INCREMENT, `isbn` int(11), day2go DATETIME, PRIMARY KEY (`id`) ); CREATE TABLE `topicLU`( `id` INT(11) NOT NULL AUTO_INCREMENT, `isbn` int(11), topicid int(11), PRIMARY KEY (`id`) ); CREATE TABLE `topic`( `id` INT(11) NOT NULL AUTO_INCREMENT, `topic` VARCHAR(150), PRIMARY KEY (`id`) ); insert into `books` values (1000, 'Clean & Lean') , (1001, 'Where the crawdads sing') , (1002, 'Life will be the death of me') , (1003,'Becoming Michelle Obama') , (1004, 'Diary of an Awesome Friendly Kid') , (1005, 'Wolfpack') , (1006, 'Girl Stop Apologising'); insert into `remove` values (1, 1004, now()); insert into `topic` values (1, 'Youth'), (2,'Biography'), (3,'Fiction'), (4, 'Self Help'), (5,'Health'); insert into `topicLU` values (1, 1000, 5) , (2, 1000, 4) , (3, 1006, 4) , (4, 1001, 3) , (5, 1003, 2) , (6, 1004, 1) , (7, 1005, 4); Code (markup): and I created a query that went something like this - but this query doesn't check that the 2 titles that have a specific topic aren't also in the remove list. select books.title, books.isbn, topicLU.topicid, topic.topic, count(tlu2.id) as `check` from `books` inner join topicLU on books.isbn = topicLU.isbn inner join topic on topicLU.topicid = topic.id inner join topicLU as tlu2 on topic.id = tlu2.topicid where not exists (select remove.id from `remove` where remove.isbn = books.isbn) group by books.isbn, topic.topic order by books.isbn having `check` >= 2 Code (markup):
Groovy. One thing I've opted to do to improve performance is to add an `active` field to the `topics` table. That way, the complicated query can be in a hidden updating script which sets the value of `active` and which I can run when there have been changes to the available books or tags. As I was conceiving of it originally, the complicated query had to check each topic to see if it's active on the public page which lists all of the [active] topics. Since books are added or removed infrequently, it seems far more efficient to add a simple WHERE `active` = 'yes' to the query on the public page.