Comments/Improvement on my Broken Link Checker

Discussion in 'PHP' started by ads2help, Oct 2, 2008.

  1. #1
    I wrote this yesterday, and I need comments on this application. All suggestion, improvement, or critics are welcomed.

    The Link

    Basically it checks the URL you entered for dead links of course.
    Its crawling speed depends on your internet speed + the size and number of links on the website.
    Currently it is coded to display broken links only, so that it doesn't affect the page view by showing all links inside the iframe.
    Besides, Javascript, aim: and mailto: links are automatically skipped.

    It works with or without the trailing slash and the http:// prefix.
    It can check a website eg: that ends with .com OR single page eg: index.php?t=90
    It is definitely NOT PERFECT..yet

    1 more thing i need your advice
    Do we really need the Deeper Check? What i mean is: check the links of the page of the links of the URL entered. I don't think it is necessary, right? Because some of them may be linking to another site, not yours. Anyway, tell me what you think.

    Thank you.

    Below is the result of checking http://forums.digitalpoint.com/ with Advanced Link Check ON
     
    ads2help, Oct 2, 2008 IP
  2. dimitar christoff

    dimitar christoff Active Member

    Messages:
    882
    Likes Received:
    62
    Best Answers:
    0
    Trophy Points:
    90
    #2
    this will kill your cpu if it is being run on a bigger site =) just took something over a minute on my blog (which is less than a month old)

    also, you may want to ajax this call. otherwise, i guess it can be useful - personally i get my links checked by my sitemapping software A1 sitemap generator.
     
    dimitar christoff, Oct 2, 2008 IP
  3. ads2help

    ads2help Peon

    Messages:
    2,142
    Likes Received:
    67
    Best Answers:
    1
    Trophy Points:
    0
    #3
    Thanks..i just made some changes. Is it faster now?
     
    ads2help, Oct 2, 2008 IP
  4. SeanBlue

    SeanBlue Peon

    Messages:
    110
    Likes Received:
    8
    Best Answers:
    0
    Trophy Points:
    0
    #4
    Nice piece of software. I ran it on my site and it gave me a pretty garbled error report:

    The best recommendation I can make is that you display errors a bit better. If I have a broken link I want to see it displayed like this:

    We found a broken link, here are the details:
    Found on Page: xxx/archives/
    Links to: xxx/seo/the-best-ways-to-generate-incoming-links
    Link Anchor Text: Best ways to get links


    P.S. I don't have active links available yet so I've replaced my domain with xxx
     
    SeanBlue, Oct 2, 2008 IP
  5. ads2help

    ads2help Peon

    Messages:
    2,142
    Likes Received:
    67
    Best Answers:
    1
    Trophy Points:
    0
    #5
    Thanks for the suggestion..

    just now i was updating the file and i forgot to add a @ in front of the function.
    you can try it one more time now
     
    ads2help, Oct 2, 2008 IP
  6. mehdi

    mehdi Peon

    Messages:
    258
    Likes Received:
    12
    Best Answers:
    0
    Trophy Points:
    0
    #6
    I think you may use Curl instead of file_get_contents, it will keep you out of HTTP REQUEST FAILED error
     
    mehdi, Oct 2, 2008 IP
  7. Sillysoft

    Sillysoft Active Member

    Messages:
    177
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    58
    #7
    You can also use an existing class that can grab all links from a webpage called Snoopy. With regards to finding the error I would get the header returned to get the code like 200 for success or a 404. Curl would be good for that.
     
    Sillysoft, Oct 2, 2008 IP
  8. ads2help

    ads2help Peon

    Messages:
    2,142
    Likes Received:
    67
    Best Answers:
    1
    Trophy Points:
    0
    #8
    Cool. I know nothing about curl. Think its time to start learning that =D thanks guys
     
    ads2help, Oct 2, 2008 IP
  9. ads2help

    ads2help Peon

    Messages:
    2,142
    Likes Received:
    67
    Best Answers:
    1
    Trophy Points:
    0
    #9
    wow curl is kinda hard.. =.= i can print the header out but i cant save it to a string or what..

    BTW, actually my link checker checks for http response code also. but it takes time. Is it because i am not using curl?
     
    ads2help, Oct 2, 2008 IP
  10. dpsubi1

    dpsubi1 Notable Member

    Messages:
    9,318
    Likes Received:
    420
    Best Answers:
    0
    Trophy Points:
    280
    #10
    may be you can disable the button once it is clicked. and re-enable it once the system checked the urls.

    BTW, I clicked a min before. but not sure what happens. it keep on working ... :D

    may be a better progressbar will help.

    thanks
     
    dpsubi1, Oct 2, 2008 IP
  11. ads2help

    ads2help Peon

    Messages:
    2,142
    Likes Received:
    67
    Best Answers:
    1
    Trophy Points:
    0
    #11
    with my skills now i cant create a progess bar XD
    anyway, how long the checking process took ?
    Thank you.
     
    ads2help, Oct 2, 2008 IP