Optimize SQL query

Discussion in 'MySQL' started by squishi, Feb 11, 2008.

  1. #1
    Please help me optimize the query below. It is executed in a php loop of up to three hundred repetitions on a database with over 600,000 records.

    Basically it queries all the entries that have certain keywords in their description, keyword field or url that are of status 'approved' and that don't belong to certain categories.

    The query works great, but I believe it is very heavy on the server.
    I am looking for ways to increase the performance of this query, because I want to remove the LIMIT and get all the results instead.

    If there's anything you see that could help me optimize this query (like rewriting part of it or creating a special index key in the database), I would be very grateful.
     
    squishi, Feb 11, 2008 IP
  2. AstarothSolutions

    AstarothSolutions Peon

    Messages:
    2,680
    Likes Received:
    77
    Best Answers:
    0
    Trophy Points:
    0
    #2
    The Like elements will be very server heavy and so would focus on this aspect.....

    Without knowing the site, db structure or what the query is really doing it is difficult to give anything meaningful.

    As a punt in the dark the namewithunderscore, namewithspace, namewithoutspace etc seems it could be a quick win by cleaning up the 2 fields so that they are all "withoutspace" or such thus reducing your 8 Like statements to three.... altneratively have a search field with clean data from description, url and keywords and then have a single Like statement.
     
    AstarothSolutions, Feb 11, 2008 IP
  3. squishi

    squishi Peon

    Messages:
    58
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #3
    Thank you for the answer.

    I already tried to explain what the query does.
    With the namewithoutunderscore etc. I try to catch all the variations that could be in the database.
    I have both words like "firstname_latename", "firstnamelastname" and "firstname lastname" in the database, so I think I need to query them in the way I do above.

    It is indeed quite heavy on the server. A usual keyword field would consist of more than just the name, so I see no way to avoid the "LIKE" statement.
     
    squishi, Feb 11, 2008 IP
  4. AstarothSolutions

    AstarothSolutions Peon

    Messages:
    2,680
    Likes Received:
    77
    Best Answers:
    0
    Trophy Points:
    0
    #4
    But how are you capturing, for example the firstname and lastname? this would be the point to clean the data going forward so you have a standard way of holding them and thus a standard way of querying them so that you dont have the 3 variants. Potentially the easiest is without a space as you just use the replace function to remove _ - " " etc. Likewise, automatically add this cleaned name into the keywords and then there isnt the issue of cleaning them too (or have the search field with the clean name)
     
    AstarothSolutions, Feb 11, 2008 IP
  5. squishi

    squishi Peon

    Messages:
    58
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #5
    Doing a search and replace in the database so that all underscores ("_") get replaced with a blank space is one possibility, yes.
    I don't know if there is other information that would be affected by such a search and replace, though. I will have to check that.
    Since I auomatically add data from different sources to the database, I would probably have to run this search&replace on a regular basis.
     
    squishi, Feb 11, 2008 IP
  6. AstarothSolutions

    AstarothSolutions Peon

    Messages:
    2,680
    Likes Received:
    77
    Best Answers:
    0
    Trophy Points:
    0
    #6
    You are much better off changing the process for populating the db so that it only has clean data going into it and then run a one off cleanup exercise on the existing data than to periodically do a clean up
     
    AstarothSolutions, Feb 11, 2008 IP
  7. squishi

    squishi Peon

    Messages:
    58
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #7
    Okay. I will do that.
    Any other suggestions?
     
    squishi, Feb 11, 2008 IP
  8. squishi

    squishi Peon

    Messages:
    58
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #8
    I found out that the underscore character matches any character in a LIKE query.

    So I only have to search for "firtnamelastname" and "firstname_lastname" and not for "firstname lastname".
    That should already help a bit.
     
    squishi, Feb 11, 2008 IP
  9. gwkg

    gwkg Peon

    Messages:
    143
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    0
    #9
    Your categories should be in a separate table and the table you are searching only point to their category id.

    table categories

    cat_id | category
    1 | cat1
    2 | cat2
    3 | cat3


    Then you can use 'AND cat_id > 14' instead of all those 'AND category !=13'


    This is called database normalization if you want to learn more about it

    http://dev.mysql.com/tech-resources/articles/intro-to-normalization.html
     
    gwkg, Feb 11, 2008 IP
  10. squishi

    squishi Peon

    Messages:
    58
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #10
    Unfortunately, making changes to the database structure is not possible.
    But thank you for the suggestion.
     
    squishi, Feb 12, 2008 IP
  11. pondlife

    pondlife Peon

    Messages:
    898
    Likes Received:
    18
    Best Answers:
    0
    Trophy Points:
    0
    #11
    I think you'll gain some ground by putting all those category checks into one like this:

    where category NOT in (category1,category2,category3);

    etc.. you can add as many as you like but it could turn out to be more efficient to say 'where category in (cat1,cat2)'
     
    pondlife, Feb 12, 2008 IP
  12. squishi

    squishi Peon

    Messages:
    58
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #12
    Thank you, pondlife!
    I have implemented a small timer into my script that shows me the time the query takes. So far, they take between 15 and 40 seconds each.
    I will implement your suggestion and will see if it is of help.
     
    squishi, Feb 13, 2008 IP