This looks like it might turn out to be very useful in future: http://www.glrsales.com/quality-directory.html It basically spiders the first page of a site, and flags it for text typical of parked domains, spam, obscenity, and so on. At the moment you can just check one url at a time. But if it is developed to handle lots more, it could become a very handy tool for directory maintenance.
I did a quick check for my directory http://www.vinabet.net . Here is the report: Basic Site Information Site: http://www.vinabet.net Date: Mon Mar 26 21:52:09 2007 IP: 68.178.232.38 (1 sites on this ip) CBlock: 68.178.232 (2 sites on this cblock) On site links: 0 (0 are nofollow) Off site links: 0 (0 are nofollow) iframes: 0 scripts: 0 Google Ad Units: 0 Google Link Units: 0 Code (markup): I think the report is not correct, because there are 5 links with nofollow tag (The latest links on homepage are shown with nofollow), and one Google Ad Unit. Just wanted to let you know.
Alright. He needs to work on correcting these issues otherwise the tool would be useless because all depends on the accuracy. In my case, It shown correctly.
Actually the nofollow wasn't the problem...for some reason it didn't parse the text of the page at all. This is what the report should have shown: Site: http://www.vinabet.net Date: Tue Mar 27 05:17:33 2007 IP: 68.178.232.38 (1 sites on this ip) CBlock: 68.178.232 (2 sites on this cblock) On site links: 67 (0 are nofollow) Off site links: 6 (5 are nofollow) iframes: 0 scripts: 2 Google Ad Units: 1 Google Link Units: 0 You may wish to manually review http://www.vinabet.net for the following reasons: GAMBLING: plain text: "gambling" Code (markup): Yes, the key is to identify pages within certain categories and not have too many false hits. The challenge with vinabet.net is to identify it as a directory, and not a gambling or "search portal"/parked page junk. It can never be 100% but we have identified (and removed) about 200 parked pages from our directory. Of the sites identified a couple where legit sites, so the tool is not designed to tell you what sights to remove from your directory, but rather help you identify sites in your directory that should be REreviewed. While the logic is being worked on we are accepting urls one by one on the web and select lists from people via email. When it is more refined we'll have a bulk submission form. Next step is to add a feedback form on the results page so help see sites we missed and false hits. Anyway thanks for the feedback.
Well, I've got 3 I want to use it on. Also, dead links don't just appear on directories: this could also be useful for checking out smaller links pages which you find on a lot of sites. Directories may be the big market for this, but you might want to consider promoting this to people who build and maintain long links pages.
In our own use and testing of this, it has functioned most effectively as a parked page detector. Those are easy to pick out. Also to be an effective tool, we belive it needs to keep a history of results so they aren't duplicated. Ie; you submit your url list. you receive back a list of 10 sites to check manually. One month later you submit your url list again. The returned list should not include any of the 10 from the previous check. This is important becuase you manually check the 10 and delete 8 from your directory, but decide two were false hits. You don't want to keep seeing those false hits everytime you check your urls. Not particularly complicated. Additionally, the interaction needs to be web based and not email based. Spam filters have interfered signficantly with our email testing.
And the improvements your look for are? The web page is just something to play with, the actual tool would work on a list of urls and return a shorter list in a useful format.
seems cool. good job. If I was building something that took care of this concept (checking content for spam related words), you can just build a script that searches the entire database rather then the entire site? Any other content not generated from the database or php script is controlled by the webmaster. However, this script will be very useful for a recently hacked directory owner who wishes to see if any of his pages were edited. Let me know what you think?
George, Is this tool still being developed? I would love to use it on one of my older directories that probably has a ton of parked pages. Morty
Hmmm yes thats definatly a nice tool... ive run into this one which is still in beta but may turn out to be very useful.. http://www.seodigger.com thx malcolm
Mort... pm me about emailing a url list and getting the results back by email... the plan is to make it more automated than that.... but for now manually works.