These 2 queries look the same to me logically yet they return different numbers. The first query (which is *very* slow) produces the correct number. The query I use will be used repeatedly in a report where these values will change per iteration: TABLE2ID = 2 or tblRec1.RecordID = 2 and lnk_fuel_type = 1 or tblRec2.RecordID = 1 Why do these 2 queries return different numbers? SELECT COUNT(DISTINCT surveys.RecordID) FROM answers INNER JOIN surveys ON answers.SurveyID = surveys.RecordID WHERE surveys.RecordID IN (SELECT SurveyID FROM answers WHERE AttributionID IN (SELECT AttributionID FROM attrib_organization3_rl WHERE FieldID IN (SELECT TABLE1ID FROM organization3_division3 WHERE TABLE2ID = 2))) AND surveys.RecordID IN (SELECT SurveyID FROM answers WHERE AttributionID IN (SELECT AttributionID FROM attrib_forklift3_rl WHERE FieldID IN (SELECT RecordID FROM forklift3 WHERE lnk_fuel_type = 1))) Code (markup): SELECT COUNT(DISTINCT surveys.RecordID) AS Cnt FROM answers INNER JOIN surveys ON answers.SurveyID = surveys.RecordID INNER JOIN attrib_organization3_rl AS tblAtr INNER JOIN organization3_division3 AS tblMlt INNER JOIN division3 AS tblRec1 ON tblMlt.Table2ID = tblRec1.RecordID ON tblAtr.FieldID = tblMlt.Table1ID ON answers.AttributionID = tblAtr.AttributionID WHERE tblRec1.RecordID = 2 AND surveys.RecordID IN ( SELECT DISTINCT SurveyID FROM answers INNER JOIN attrib_forklift3_rl AS tblAtr INNER JOIN forklift3 AS tblLnkd INNER JOIN fuel_type3 AS tblRec2 ON tblLnkd.lnk_fuel_type = tblRec2.RecordID ON tblAtr.FieldID = tblRec2.RecordID ON answers.AttributionID = tblAtr.AttributionID WHERE tblRec2.RecordID = 1) Code (markup):
If you can't break the queries down and evaluate what they do bit by bit it may help to create some mock data on a sql fiddle/sandbox type of site and share that data with us so we can see it a bit more clearly. https://www.google.com/search?q=mysql+fiddle
I know you hate grok and you think I'm promoting it (I'm not), but it said this: Your first query is Subquery-based and your second one is JOIN-based. That's why you have the discrepancies. Problem: The first query’s subquery structure may lead to duplicate SurveyID values or miss records due to non-distinct subquery results or differing join behavior. The second query’s explicit joins and DISTINCT in the subquery may handle duplicates differently or include additional records due to the join order and conditions. To ensure consistent results, rewrite the first query to match the second query’s logic using explicit joins and DISTINCT where needed, ensuring both queries apply the same filtering logic: SELECT COUNT(DISTINCT surveys.RecordID) AS Cnt FROM answers INNER JOIN surveys ON answers.SurveyID = surveys.RecordID INNER JOIN attrib_organization3_rl AS tblAtr INNER JOIN organization3_division3 AS tblMlt INNER JOIN division3 AS tblRec1 ON tblMlt.Table2ID = tblRec1.RecordID ON tblAtr.FieldID = tblMlt.Table1ID ON answers.AttributionID = tblAtr.AttributionID WHERE tblRec1.RecordID = 2 AND surveys.RecordID IN ( SELECT DISTINCT SurveyID FROM answers INNER JOIN attrib_forklift3_rl AS tblAtr INNER JOIN forklift3 AS tblLnkd INNER JOIN fuel_type3 AS tblRec2 ON tblLnkd.lnk_fuel_type = tblRec2.RecordID ON tblAtr.FieldID = tblRec2.RecordID ON answers.AttributionID = tblAtr.AttributionID WHERE tblRec2.RecordID = 1 ) Code (markup): Hope it can, at least, point you in the right direction.
sarahk: Thanks for your response. "evaluate what they do bit by bit" I'm not sure what you mean. Mostly, I'm trying to learn about mysql and INNER JOIN vs IN(SELECT ...) I thought doing it either way should return the same thing. I was wrong and I don't know why. I really need to be able to see the logic because I use INNER JOIN (I'm thinking wrongly) a lot. It's basically just dealing with foreign keys. The JOIN query is in 2 parts, each pinned to answers.AttributionID. answers.AttributionID = attrib_organization3_rl.AttributionID attrib_organization3_rl.FieldID = organization3.RecordID (primary key) organization3.RecordID = organization3_division3.Table1ID. I skipped organization3 to shorten the INNER JOIN query (FieldID = RecordID = Table1ID). organization3_division3.Table2ID = division3.RecordID. answers.AttributionID = attrib_forklift3_rl.AttributionID attrib_forklift3_rl.FieldID = forklift3.RecordID forklift3.lnk_fuel_type = fuel_type3.RecordID. The query is used repeatedly with both WHERE cases changing. tblRec1.RecordID = 2, tblRec1.RecordID = 3, tblRec1.RecordID = 4 ... tblRec2.RecordID = 1, tblRec2.RecordID = 2, tblRec2.RecordID = 3 ... So speed is critical. It turns out that this much: WHERE surveys.RecordID IN (SELECT SurveyID FROM answers WHERE AttributionID IN Code (markup): was slop. I changed it to this which made it much faster: SELECT COUNT(DISTINCT surveys.RecordID) FROM answers INNER JOIN surveys ON answers.SurveyID = surveys.RecordID WHERE answers.AttributionID IN (SELECT AttributionID FROM attrib_organization3_rl WHERE FieldID IN (SELECT Table1ID FROM organization3_division3 WHERE Table2ID = 2)) AND answers.AttributionID IN (SELECT AttributionID FROM attrib_forklift3_rl WHERE FieldID IN (SELECT RecordID FROM forklift3 WHERE lnk_fuel_type = 1)) Code (markup): Any help in learning why the 2 queries don't return the same result would be greatly appreciated.
quikad: I believe you and thanks for the response. You may have nailed it for me. surveys.RecordID IN ( SELECT DISTINCT SurveyID ... Code (markup): That makes perfect sense and I'll test it. If that's it then for me - WHEW! - it was just a small bug that I couldn't see. My understanding of INNER JOIN may after all be intact. That's a better response than chatgpt gave me. Maybe I'll have to reconsider that icon matching game they insist on (why I can't figure). Thanks much.
I haven't gone through it but joins will always be more efficient - but sometimes outer joins are going to give you better results. All those "in" statements really add to the workload. I have had significant gains in speed and readability by using views to create important subsets of data and then my queries join with the view. Back in the old days we'd write a query like this select * from tableA, tableB where tableA.id = tableB.foreign_id Code (markup): and now we'd write it like this select * from tableA outer join tableB on tableA.id = tableB.foreign_id Code (markup): The beauty of the outer join is that it excludes records where there isn't a match in both tables. It may be the solution to all those "in" queries.