1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

MySQL MyISAM sorting query makes it slower

Discussion in 'MySQL' started by bobbyfischer153, Apr 13, 2014.

  1. #1
    Hi,

    I am using MySQL 5.1 on a Windows Server 2008 (with 4GB RAM) and have the following configuration:

    I have 2 MyISAM tables. One is in 1 database (DB1) and has 14 columns, which are mostly varchar. This table has about 5,000,000 rows and is the DB1.games table below.

    It has a primary key on GameNumber (int(10)).

    The other table is the DB2.gameposition and consists of the columns GameNumber (links to DB1.games) and PositionCode (int(10)). This table has about 400,000,000 rows and there is an index IX_PositionCode on PositionCode.

    These 2 databases are on the same server.

    I want to run a query on DB2.gameposition to find a particular PositionCode, and have the results sorted by the linking DB1.games.Yr field (smallint(6) - this represents a Year). This sorting of results I only introduced recently. There is an index on this Yr field in DB1.games.

    In my stored procedure, I perform the following:

    CREATE TEMPORARY TABLE tblGameNumbers(GameNumber INT UNSIGNED NOT NULL PRIMARY KEY);

    INSERT INTO tblGameNumbers(GameNumber)
    SELECT DISTINCT gp.GameNumber
    FROM DB2.gameposition gp
    WHERE PositionCode = var_PositionCode LIMIT 1000;

    I just get 1000 to make it quicker...

    And then join it to the DB1.games table.

    In order to generate an EXPLAIN from that, I took out the temporary table (I use in the stored procedure) and refactored it as seen in the inner subquery below:

    EXPLAIN
    SELECT *
    FROM DB1.games g
    INNER JOIN (SELECT DISTINCT gp.GameNumber
    FROM DB2.gameposition gp
    WHERE PositionCode = 669312116 LIMIT 1000
    ) B ON g.GameNumber = B.GameNumber
    ORDER BY g.Yr DESC
    LIMIT 0,28

    Running the EXPLAIN above, I see the following:

    1, 'PRIMARY', '', 'ALL', '', '', '', '', 1000, 'Using temporary; Using filesort'
    1, 'PRIMARY', 'g', 'eq_ref', 'PRIMARY', 'PRIMARY', '4', 'B.GameNumber', 1, ''
    2, 'DERIVED', 'gp', 'ref', 'IX_PositionCode', 'IX_PositionCode', '4', '', 1889846, 'Using temporary'

    The query used to be almost instant before I introduced the ORDER BY clause. Now, sometimes it is quick (depending on different PositionCode), but other times it can take up to 10 seconds to return the rows. Before I introduced the sorting, it was always virtually instantaneous. Unfortunately, I am not too proficient in interpreting the EXPLAIN output. Or how to make the query faster. (Although I think using filesort and using temporary are not the best execution methods)

    I am thinking of increasing the server memory to 8GB and tune some variables (amongst them the key_buffer_size variable which is currently 512M)...would this be of any help?

    Any help would be greatly appreciated!

    Thanks in advance, Tim
     
    bobbyfischer153, Apr 13, 2014 IP
  2. jestep

    jestep Prominent Member

    Messages:
    3,659
    Likes Received:
    215
    Best Answers:
    19
    Trophy Points:
    330
    #2
    Have you considered switching to Innodb? MyISAM is almost untunable, not that it is specifically the issue here, but Innodb can be customized much easier. You're dealing with a substantial amount of rows, even if they don't contain a huge number of columns, there's going to be significant overhead when querying the whole table.

    As far as function, what's the performance benefit of creating a 1000 row temp table vs. a direct cross-database join?

    Your bottleneck at least in the example is the nested join returning 1000 rows. It has to perform the entire query in the join. Once it gets that result set, it has to perform an entirely new query on the result, and then sort it. If the table you're joining to was static it would be faster, but by creating the temp table each time your run the query, you would likely end up with similar speeds. It's also possibly you're maxing out your ram and hitting swap which is going to destroy the execution time.

    Will something like this not work without using a temp table?

    SELECT *
    FROM DB1.games g
    INNER JOIN DB2.gameposition gp
    ON (g.GameNumber = gp.GameNumber AND gp.PositionCode = 669312116)
    ORDER BY g.Yr DESC
    LIMIT 0,28
     
    jestep, Apr 16, 2014 IP
  3. pmf123

    pmf123 Notable Member

    Messages:
    1,447
    Likes Received:
    75
    Best Answers:
    0
    Trophy Points:
    215
    #3
    yes anytime you use ORDER BY and a LIMIT, it will run the entire query without the LIMIT and then ORDER them and then send back the number from the LIMIT...
     
    pmf123, Aug 31, 2014 IP
  4. Krellen

    Krellen Greenhorn

    Messages:
    38
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    18
    #4
    It doesn't always work that way. In your case it can't limit until after the results/sorting.

    You want to avoid using DISTINCT and subqueries if possible. See if jestep's suggested query works for you, which should be much more efficient than what you are doing now.
     
    Krellen, Sep 19, 2014 IP
  5. Krellen

    Krellen Greenhorn

    Messages:
    38
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    18
    #5
    Maybe slightly, but probably not much. This is most likely a query optimization issue.
     
    Krellen, Sep 19, 2014 IP
  6. Krellen

    Krellen Greenhorn

    Messages:
    38
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    18
    #6
    You can also try optimizing your indexes. You might benefit from a covering index that includes g.Yr.
     
    Krellen, Sep 19, 2014 IP