Why do these 2 queries return different numbers?

Discussion in 'MySQL' started by SoftLink, Jun 19, 2025 at 4:16 AM.

  1. #1
    These 2 queries look the same to me logically yet they return different numbers.
    The first query (which is *very* slow) produces the correct number.

    The query I use will be used repeatedly in a report where these values will change per iteration:
    TABLE2ID = 2 or tblRec1.RecordID = 2
    and
    lnk_fuel_type = 1 or tblRec2.RecordID = 1

    Why do these 2 queries return different numbers?

    SELECT COUNT(DISTINCT surveys.RecordID)
    FROM answers INNER JOIN surveys ON answers.SurveyID = surveys.RecordID
    WHERE surveys.RecordID IN
        (SELECT SurveyID FROM answers WHERE AttributionID IN
            (SELECT AttributionID FROM attrib_organization3_rl WHERE FieldID
                IN (SELECT TABLE1ID FROM organization3_division3 WHERE TABLE2ID = 2)))
    AND surveys.RecordID IN
        (SELECT SurveyID FROM answers WHERE AttributionID IN
            (SELECT AttributionID FROM attrib_forklift3_rl WHERE FieldID
                IN (SELECT RecordID FROM forklift3 WHERE lnk_fuel_type = 1)))
    Code (markup):


    SELECT COUNT(DISTINCT surveys.RecordID) AS Cnt
    FROM answers INNER JOIN surveys ON answers.SurveyID = surveys.RecordID
        INNER JOIN attrib_organization3_rl AS tblAtr        
            INNER JOIN organization3_division3 AS tblMlt
                INNER JOIN division3 AS tblRec1
                ON tblMlt.Table2ID = tblRec1.RecordID        
            ON tblAtr.FieldID = tblMlt.Table1ID
        ON answers.AttributionID = tblAtr.AttributionID
    WHERE tblRec1.RecordID = 2 AND surveys.RecordID IN (
        SELECT DISTINCT SurveyID FROM answers
            INNER JOIN attrib_forklift3_rl AS tblAtr
                INNER JOIN forklift3 AS tblLnkd
                    INNER JOIN fuel_type3 AS tblRec2
                    ON tblLnkd.lnk_fuel_type = tblRec2.RecordID
                ON tblAtr.FieldID = tblRec2.RecordID
            ON answers.AttributionID = tblAtr.AttributionID
        WHERE tblRec2.RecordID = 1)
    Code (markup):
     
    Last edited by a moderator: Jun 19, 2025 at 9:15 PM
    SoftLink, Jun 19, 2025 at 4:16 AM IP
  2. sarahk

    sarahk iTamer Staff

    Messages:
    28,796
    Likes Received:
    4,531
    Best Answers:
    123
    Trophy Points:
    665
    #2
    If you can't break the queries down and evaluate what they do bit by bit it may help to create some mock data on a sql fiddle/sandbox type of site and share that data with us so we can see it a bit more clearly.

    https://www.google.com/search?q=mysql+fiddle
     
    sarahk, Jun 19, 2025 at 9:19 PM IP
  3. qwikad.com

    qwikad.com Illustrious Member Affiliate Manager

    Messages:
    7,271
    Likes Received:
    1,696
    Best Answers:
    31
    Trophy Points:
    475
    #3
    I know you hate grok and you think I'm promoting it (I'm not), but it said this:

    Your first query is Subquery-based and your second one is JOIN-based. That's why you have the discrepancies.

    Problem: The first query’s subquery structure may lead to duplicate SurveyID values or miss records due to non-distinct subquery results or differing join behavior. The second query’s explicit joins and DISTINCT in the subquery may handle duplicates differently or include additional records due to the join order and conditions.

    To ensure consistent results, rewrite the first query to match the second query’s logic using explicit joins and DISTINCT where needed, ensuring both queries apply the same filtering logic:

    SELECT COUNT(DISTINCT surveys.RecordID) AS Cnt
    FROM answers
    INNER JOIN surveys ON answers.SurveyID = surveys.RecordID
    INNER JOIN attrib_organization3_rl AS tblAtr
        INNER JOIN organization3_division3 AS tblMlt
            INNER JOIN division3 AS tblRec1
            ON tblMlt.Table2ID = tblRec1.RecordID
        ON tblAtr.FieldID = tblMlt.Table1ID
    ON answers.AttributionID = tblAtr.AttributionID
    WHERE tblRec1.RecordID = 2
    AND surveys.RecordID IN (
        SELECT DISTINCT SurveyID
        FROM answers
        INNER JOIN attrib_forklift3_rl AS tblAtr
            INNER JOIN forklift3 AS tblLnkd
                INNER JOIN fuel_type3 AS tblRec2
                ON tblLnkd.lnk_fuel_type = tblRec2.RecordID
            ON tblAtr.FieldID = tblRec2.RecordID
        ON answers.AttributionID = tblAtr.AttributionID
        WHERE tblRec2.RecordID = 1
    )
    Code (markup):
    Hope it can, at least, point you in the right direction.
     
    qwikad.com, Jun 20, 2025 at 3:41 AM IP
    sarahk likes this.
  4. SoftLink

    SoftLink Active Member

    Messages:
    138
    Likes Received:
    7
    Best Answers:
    0
    Trophy Points:
    60
    #4
    sarahk: Thanks for your response.
    "evaluate what they do bit by bit" I'm not sure what you mean.

    Mostly, I'm trying to learn about mysql and INNER JOIN vs IN(SELECT ...)
    I thought doing it either way should return the same thing. I was wrong and I don't know why.
    I really need to be able to see the logic because I use INNER JOIN (I'm thinking wrongly) a lot.


    It's basically just dealing with foreign keys.
    The JOIN query is in 2 parts, each pinned to answers.AttributionID.

    answers.AttributionID = attrib_organization3_rl.AttributionID
    attrib_organization3_rl.FieldID = organization3.RecordID (primary key)
    organization3.RecordID = organization3_division3.Table1ID.
    I skipped organization3 to shorten the INNER JOIN query (FieldID = RecordID = Table1ID).
    organization3_division3.Table2ID = division3.RecordID.

    answers.AttributionID = attrib_forklift3_rl.AttributionID
    attrib_forklift3_rl.FieldID = forklift3.RecordID
    forklift3.lnk_fuel_type = fuel_type3.RecordID.

    The query is used repeatedly with both WHERE cases changing.
    tblRec1.RecordID = 2, tblRec1.RecordID = 3, tblRec1.RecordID = 4 ...
    tblRec2.RecordID = 1, tblRec2.RecordID = 2, tblRec2.RecordID = 3 ...
    So speed is critical.

    It turns out that this much:
    
    
    WHERE surveys.RecordID IN
        (SELECT SurveyID FROM answers WHERE AttributionID IN
    
    Code (markup):
    was slop. I changed it to this which made it much faster:
    
    
    SELECT COUNT(DISTINCT surveys.RecordID)
    FROM answers INNER JOIN surveys ON answers.SurveyID = surveys.RecordID
    WHERE answers.AttributionID IN
        (SELECT AttributionID FROM attrib_organization3_rl WHERE FieldID IN
            (SELECT Table1ID FROM organization3_division3 WHERE Table2ID = 2))
    AND answers.AttributionID IN
        (SELECT AttributionID FROM attrib_forklift3_rl WHERE FieldID IN
            (SELECT RecordID FROM forklift3 WHERE lnk_fuel_type = 1))
    
    Code (markup):
    Any help in learning why the 2 queries don't return the same result would be greatly appreciated.
     
    SoftLink, Jun 20, 2025 at 3:42 AM IP
  5. SoftLink

    SoftLink Active Member

    Messages:
    138
    Likes Received:
    7
    Best Answers:
    0
    Trophy Points:
    60
    #5
    quikad: I believe you and thanks for the response.
    You may have nailed it for me.
    
    
    surveys.RecordID IN (
        SELECT DISTINCT SurveyID ...
    
    Code (markup):
    That makes perfect sense and I'll test it.
    If that's it then for me - WHEW! - it was just a small bug that I couldn't see.
    My understanding of INNER JOIN may after all be intact.

    That's a better response than chatgpt gave me.
    Maybe I'll have to reconsider that icon matching game they insist on (why I can't figure).
    Thanks much.
     
    SoftLink, Jun 20, 2025 at 3:49 AM IP
  6. sarahk

    sarahk iTamer Staff

    Messages:
    28,796
    Likes Received:
    4,531
    Best Answers:
    123
    Trophy Points:
    665
    #6
    I haven't gone through it but joins will always be more efficient - but sometimes outer joins are going to give you better results. All those "in" statements really add to the workload.

    I have had significant gains in speed and readability by using views to create important subsets of data and then my queries join with the view.

    Back in the old days we'd write a query like this

    select * 
    from tableA, tableB
    where tableA.id = tableB.foreign_id
    Code (markup):
    and now we'd write it like this
    select * 
    from tableA
    outer join tableB on tableA.id = tableB.foreign_id
    Code (markup):
    The beauty of the outer join is that it excludes records where there isn't a match in both tables. It may be the solution to all those "in" queries.
     
    sarahk, Jun 20, 2025 at 10:09 AM IP