Need to check titles of 100K+ Pages

Discussion in 'PHP' started by Stu--, Jun 6, 2009.

  1. #1
    I need to add a table to one of my pages that Shows "Articles related to *keyword*"

    It will take one keyword and then check my list of articles for articles that have the keyword in the title.

    One problem, there are over 100,000 article pages. The check needs to be pretty much instant to stop the page slowing down.

    I previously used filexists() but that only checks for articles that match the keyword perfectly. I need to check for articles that contain the keyword.

    Most functions will check through the entire list of 100,000 articles and be too slow.

    Any suggestions?
     
    Stu--, Jun 6, 2009 IP
  2. hip_hop_x

    hip_hop_x Active Member

    Messages:
    522
    Likes Received:
    5
    Best Answers:
    0
    Trophy Points:
    90
    #2
    do you have a mysql database for the articles? And the titles are stored into the database?
     
    hip_hop_x, Jun 6, 2009 IP
  3. Stu--

    Stu-- Active Member

    Messages:
    355
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    53
    #3
    Yes, sorry I forgot to mention that.
     
    Stu--, Jun 6, 2009 IP
  4. hip_hop_x

    hip_hop_x Active Member

    Messages:
    522
    Likes Received:
    5
    Best Answers:
    0
    Trophy Points:
    90
    #4
    Then it's simple
    
    $keyword=mysql_real_escape_string("the keyword"); //for security i added mysql_real_escape_string
    
    $query=mysql_query("SELECT article_link, article_title FROM table WHERE article_title LIKE '%$keyword%'");
    echo "<table>";
    while($row=mysql_fetch_array($query)){
    
    echo "<tr><td><a href='".$row['article_link'].">".$row['article_title']."</a></td><td>etc.. other data</td></tr>";
    
    }
    echo "</table>";
    
    Code (markup):
     
    hip_hop_x, Jun 6, 2009 IP
  5. Stu--

    Stu-- Active Member

    Messages:
    355
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    53
    #5
    Thanks!

    But will this not be really slow? The site is extremely popular (5000 in alexa) and accessing the database on every page load is really not desirable.

    I was thinking of some kind of cron job, but I'm not really sure how this would work as I would still have the problem of having a really large file list that I would need to access.
     
    Stu--, Jun 6, 2009 IP
  6. Social.Network

    Social.Network Member

    Messages:
    517
    Likes Received:
    15
    Best Answers:
    0
    Trophy Points:
    35
    #6
    Using the LIKE condition is slow; combined with a roundtrip on every page, it may not meet your non-functional performance requirements.

    I have used a couple of approaches; one is keeping the "keyword" and association to related documents in the database. This eliminates the need to use the LIKE condition. Also, if you choose to use the LIKE condition and check on each page load then I would suggest you implement a caching strategy.

    For example, hit the database once and then cache the results using the "keyword" as the key to results. You can use various methods to manage the cache such as time-based expiration, database or file dependency, etc.

    Lastly, you can continue using the file-based approach, but use the cron job to create the related article files. I/O operations are faster than database operations.
     
    Social.Network, Jun 6, 2009 IP