Should I be linking to the site that consistently displays "403 forbidden" result with my broken link check tool? Will it have any negative effect on my site?
The site is currently blocking links that are comming from your domain name. Some site use this techinque to block certain websites to save bandwidth. If they are blocking links from your side which means your site vistors will not reach your site so is there is no point in linking to that site, logically you should remove the site link.
I can actually navigate to that site from my directory. I was just a bit worried about word "forbidden". I also get 404 error results but when I click on links pages are there. So I am just a little bit confused. I think I'll get rid of the 403, if they don't want people linking to their site I am not going to waste my links.
I keep seeing people who post robots.txt files blocking link-checkers like Xenu. In my opinion, that's just plain dumb. I've used Xenu for years. If it tells me it can't connect to a page, I delete the link to that page. When I see people recommending blocking link-checkers, I ask them if that's what they are hoping to have happen... I would recommend you delete the link and email the webmaster to tell him why - educate the poor sod.
Xenu linksleuth is just one of a number of robots, spiders, harvesters etc that is "access denied" in the .htaccess file of phpnuke by default. If you can think of a good reason why it shouldn't be there then I'll listen. #The next lines check for Email Spammers Robots and redirect them to a fake page RewriteCond %{HTTP_USER_AGENT} ^Alexibot [OR] RewriteCond %{HTTP_USER_AGENT} ^asterias [OR] RewriteCond %{HTTP_USER_AGENT} ^BackDoorBot [OR] RewriteCond %{HTTP_USER_AGENT} ^Black.Hole [OR] RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR] RewriteCond %{HTTP_USER_AGENT} ^BlowFish [OR] RewriteCond %{HTTP_USER_AGENT} ^BotALot [OR] RewriteCond %{HTTP_USER_AGENT} ^BuiltBotTough [OR] RewriteCond %{HTTP_USER_AGENT} ^Bullseye [OR] RewriteCond %{HTTP_USER_AGENT} ^BunnySlippers [OR] RewriteCond %{HTTP_USER_AGENT} ^Cegbfeieh [OR] RewriteCond %{HTTP_USER_AGENT} ^CheeseBot [OR] RewriteCond %{HTTP_USER_AGENT} ^CherryPicker [OR] RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR] RewriteCond %{HTTP_USER_AGENT} ^CopyRightCheck [OR] RewriteCond %{HTTP_USER_AGENT} ^cosmos [OR] RewriteCond %{HTTP_USER_AGENT} ^Crescent [OR] RewriteCond %{HTTP_USER_AGENT} ^Custo [OR] RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR] RewriteCond %{HTTP_USER_AGENT} ^DittoSpyder [OR] RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR] RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR] RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR] RewriteCond %{HTTP_USER_AGENT} ^EmailCollector [OR] RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR] RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR] RewriteCond %{HTTP_USER_AGENT} ^EroCrawler [OR] RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR] RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR] RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR] RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR] RewriteCond %{HTTP_USER_AGENT} ^Foobot [OR] RewriteCond %{HTTP_USER_AGENT} ^FrontPage [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR] RewriteCond %{HTTP_USER_AGENT} ^GetWeb! [OR] RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR] RewriteCond %{HTTP_USER_AGENT} ^Googlebot-Image [OR] RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR] RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR] RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR] RewriteCond %{HTTP_USER_AGENT} ^Harvest [OR] RewriteCond %{HTTP_USER_AGENT} ^hloader [OR] RewriteCond %{HTTP_USER_AGENT} ^HMView [OR] RewriteCond %{HTTP_USER_AGENT} ^httplib [OR] RewriteCond %{HTTP_USER_AGENT} ^HTTrack [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^humanlinks [OR] RewriteCond %{HTTP_USER_AGENT} ^ia_archiver [OR] RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [OR] RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [OR] RewriteCond %{HTTP_USER_AGENT} ^Indy\ Library [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^InfoNaviRobot [OR] RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR] RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR] RewriteCond %{HTTP_USER_AGENT} ^JennyBot [OR] RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR] RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [OR] RewriteCond %{HTTP_USER_AGENT} ^Kenjin.Spider [OR] RewriteCond %{HTTP_USER_AGENT} ^Keyword.Density [OR] RewriteCond %{HTTP_USER_AGENT} ^larbin [OR] RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR] RewriteCond %{HTTP_USER_AGENT} ^LexiBot [OR] RewriteCond %{HTTP_USER_AGENT} ^libWeb/clsHTTP [OR] RewriteCond %{HTTP_USER_AGENT} ^LinkextractorPro [OR] RewriteCond %{HTTP_USER_AGENT} ^LinkScan/8.1a.Unix [OR] RewriteCond %{HTTP_USER_AGENT} ^LinkWalker [OR] RewriteCond %{HTTP_USER_AGENT} ^lwp-trivial [OR] RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [OR] RewriteCond %{HTTP_USER_AGENT} ^Mata.Hari [OR] RewriteCond %{HTTP_USER_AGENT} ^Microsoft.URL [OR] RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR] RewriteCond %{HTTP_USER_AGENT} ^MIIxpc [OR] RewriteCond %{HTTP_USER_AGENT} ^Mister.PiX [OR] RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR] RewriteCond %{HTTP_USER_AGENT} ^moget [OR] #RewriteCond %{HTTP_USER_AGENT} ^Mozilla/2 [OR] #RewriteCond %{HTTP_USER_AGENT} ^Mozilla/3.Mozilla/2.01 [OR] RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*NEWT [OR] RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR] RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR] RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR] RewriteCond %{HTTP_USER_AGENT} ^NetMechanic [OR] RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR] RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR] RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR] RewriteCond %{HTTP_USER_AGENT} ^NICErsPRO [OR] RewriteCond %{HTTP_USER_AGENT} ^NPBot [OR] RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR] RewriteCond %{HTTP_USER_AGENT} ^Offline.Explorer [OR] RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [OR] RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [OR] RewriteCond %{HTTP_USER_AGENT} ^Openfind [OR] RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR] RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR] RewriteCond %{HTTP_USER_AGENT} ^pavuk [OR] RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR] RewriteCond %{HTTP_USER_AGENT} ^ProPowerBot/2.14 [OR] RewriteCond %{HTTP_USER_AGENT} ^ProWebWalker [OR] RewriteCond %{HTTP_USER_AGENT} ^ProWebWalker [OR] RewriteCond %{HTTP_USER_AGENT} ^QueryN.Metasearch [OR] RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR] RewriteCond %{HTTP_USER_AGENT} ^RepoMonkey [OR] RewriteCond %{HTTP_USER_AGENT} ^RMA [OR] RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR] RewriteCond %{HTTP_USER_AGENT} ^SlySearch [OR] RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR] RewriteCond %{HTTP_USER_AGENT} ^SpankBot [OR] RewriteCond %{HTTP_USER_AGENT} ^spanner [OR] RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR] RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR] RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR] RewriteCond %{HTTP_USER_AGENT} ^suzuran [OR] RewriteCond %{HTTP_USER_AGENT} ^Szukacz/1.4 [OR] RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR] RewriteCond %{HTTP_USER_AGENT} ^Teleport [OR] RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR] RewriteCond %{HTTP_USER_AGENT} ^Telesoft [OR] RewriteCond %{HTTP_USER_AGENT} ^The.Intraformant [OR] RewriteCond %{HTTP_USER_AGENT} ^TheNomad [OR] RewriteCond %{HTTP_USER_AGENT} ^TightTwatBot [OR] RewriteCond %{HTTP_USER_AGENT} ^Titan [OR] RewriteCond %{HTTP_USER_AGENT} ^toCrawl/UrlDispatcher [OR] RewriteCond %{HTTP_USER_AGENT} ^toCrawl/UrlDispatcher [OR] RewriteCond %{HTTP_USER_AGENT} ^True_Robot [OR] RewriteCond %{HTTP_USER_AGENT} ^turingos [OR] RewriteCond %{HTTP_USER_AGENT} ^TurnitinBot/1.5 [OR] RewriteCond %{HTTP_USER_AGENT} ^URLy.Warning [OR] RewriteCond %{HTTP_USER_AGENT} ^VCI [OR] RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR] RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR] RewriteCond %{HTTP_USER_AGENT} ^WebBandit [OR] RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR] RewriteCond %{HTTP_USER_AGENT} ^WebEMailExtrac.* [OR] RewriteCond %{HTTP_USER_AGENT} ^WebEnhancer [OR] RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR] RewriteCond %{HTTP_USER_AGENT} ^WebGo\ IS [OR] RewriteCond %{HTTP_USER_AGENT} ^Web.Image.Collector [OR] RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [OR] RewriteCond %{HTTP_USER_AGENT} ^WebLeacher [OR] RewriteCond %{HTTP_USER_AGENT} ^WebmasterWorldForumBot [OR] RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR] RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR] RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [OR] RewriteCond %{HTTP_USER_AGENT} ^Website.Quester [OR] RewriteCond %{HTTP_USER_AGENT} ^Website\ Quester [OR] RewriteCond %{HTTP_USER_AGENT} ^Webster.Pro [OR] RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR] RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR] RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR] RewriteCond %{HTTP_USER_AGENT} ^WebZip [OR] RewriteCond %{HTTP_USER_AGENT} ^Widow [OR] RewriteCond %{HTTP_USER_AGENT} ^[Ww]eb[Bb]andit [OR] RewriteCond %{HTTP_USER_AGENT} ^WWW-Collector-E [OR] RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE [OR] RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [OR] RewriteCond %{HTTP_USER_AGENT} ^Xenu's [OR] RewriteCond %{HTTP_USER_AGENT} ^Zeus
I thought I'd already done that. Like many other people, I use Xenu Link Checker to scan for dead links on my main site (100+ pages of categorized links). If I have a link to one of your pages and you block Xenu, I'll get a report from Xenu saying it couldn't access that page. Since I'm not about to individually check every page that yields an error, in most cases I'll simply delete that link from my site. That means if you've blocked Xenu you've just lost a backlink to one of your pages.
Not always, no. It depends on the site... and on how busy I am that day... and on whether I want to do more exploration. If it's a link to a site on a topic that is already well-represented on that page, I'd probably just dump it. If it's more unique, I might go to the trouble of checking it again in a day or two... Xenu also tells you what the "error" was, i.e., page not found, access forbidden, request timed out, etc. So it would depend again on how busy I am that day and whether I felt like investigating further and what error eas returned for the link. However, on a busy day, for a site that isn't unique, it would be frankly a lot simpler for me to just delete the link. That's my point - if you block Xenu or similar link-checkers, you run the risk of losing back links. Why would you want to take that chance for a request from a benign probe? I don't recommend EVER using one of those one-size fits all htaccess blockers. I'm amazed at some of the things I see on those files - like SE spiders. At the very least, check each entry and decide for yourself whether it's something YOU want to block. I'm especially surprised that Xenu is blocked by default for phpNuke - I've never used phpNuke but that seems to me to be a bad idea.