Cache creator & Crawler to detect dead links on website.

Discussion in 'Databases' started by Ins, Jun 7, 2007.

  1. #1
    Hi,

    I have about 25000 external links on my community website resource database; how easy it is to create a crawler to automatically detect the dead links and report them back to admin for manual removal, and,

    How to create and display cache for all these external links, in case the external website is down. Just like Google has.

    If you can refer me to some free/cheap software or literature available, it would be great.

    Thanks in advance!

    Ins
     
    Ins, Jun 7, 2007 IP
  2. ketan9

    ketan9 Active Member

    Messages:
    548
    Likes Received:
    9
    Best Answers:
    0
    Trophy Points:
    58
    #2
    crawling the links is not rocket science, all you have to do is use following code.
    If your fopen fails then it usually means that site is down, if you are able to open it then it means the site is live. Log this action to your db and report it to admin for removal if site not active! Simple.. isn't it!

    For cache, you will have to grab the contents and store it to your local server! Concept wise things are easy but the problems crop up when you start to go large-scale!
     
    ketan9, Jun 8, 2007 IP
  3. damonp

    damonp Peon

    Messages:
    97
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #3
    damonp, Jun 8, 2007 IP
  4. Ins

    Ins Peon

    Messages:
    6
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #4
    Thanks, damonP.

    Thanks, Ketan9!
     
    Ins, Jun 8, 2007 IP
  5. rthurul

    rthurul Peon

    Messages:
    45
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #5
    You cannot really do this in good conditions using PHP.. What you need is a perl script that would go trough records and using wget or curl try to download the file with a timeout of few seconds...

    If it takes more than 10 seconds to get the file or returns error then it is not valid... Php does not seems the right choice for this in my opinion
     
    rthurul, Jun 16, 2007 IP