Interesting URL checking tool in development

Discussion in 'Directories' started by Obelia, Mar 26, 2007.

  1. #1
    This looks like it might turn out to be very useful in future:

    http://www.glrsales.com/quality-directory.html

    It basically spiders the first page of a site, and flags it for text typical of parked domains, spam, obscenity, and so on.

    At the moment you can just check one url at a time. But if it is developed to handle lots more, it could become a very handy tool for directory maintenance.
     
    Obelia, Mar 26, 2007 IP
  2. GAdsense

    GAdsense Well-Known Member

    Messages:
    1,247
    Likes Received:
    60
    Best Answers:
    0
    Trophy Points:
    140
    #2
    Nice Buddy! :)
    Keep it up. We are with you.
     
    GAdsense, Mar 26, 2007 IP
  3. vnviews

    vnviews Peon

    Messages:
    746
    Likes Received:
    36
    Best Answers:
    0
    Trophy Points:
    0
    #3
    I did a quick check for my directory http://www.vinabet.net . Here is the report:
    
    Basic Site Information
    
    Site:   http://www.vinabet.net
    Date:   Mon Mar 26 21:52:09 2007
    
    IP:     68.178.232.38 (1 sites on this ip)
    CBlock: 68.178.232 (2 sites on this cblock)
    
    On site links:  0 (0 are nofollow)
    Off site links: 0 (0 are nofollow)
    
    iframes: 0
    scripts: 0
    
    Google Ad Units:   0
    Google Link Units: 0
    
    Code (markup):
    I think the report is not correct, because there are 5 links with nofollow tag (The latest links on homepage are shown with nofollow), and one Google Ad Unit.
    Just wanted to let you know.
     
    vnviews, Mar 26, 2007 IP
  4. GAdsense

    GAdsense Well-Known Member

    Messages:
    1,247
    Likes Received:
    60
    Best Answers:
    0
    Trophy Points:
    140
    #4
    Alright. He needs to work on correcting these issues otherwise the tool would be useless because all depends on the accuracy.

    In my case, It shown correctly.
     
    GAdsense, Mar 26, 2007 IP
  5. George55

    George55 Peon

    Messages:
    63
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    0
    #5
    Actually the nofollow wasn't the problem...for some reason it didn't parse the text of the page at all. This is what the report should have shown:

    
    Site:   http://www.vinabet.net
    Date:   Tue Mar 27 05:17:33 2007
    
    IP:     68.178.232.38 (1 sites on this ip)
    CBlock: 68.178.232 (2 sites on this cblock)
    
    On site links:  67 (0 are nofollow)
    Off site links: 6 (5 are nofollow)
    
    iframes: 0
    scripts: 2
    
    Google Ad Units:   1
    Google Link Units: 0
    You may wish to manually review http://www.vinabet.net for the following reasons:
    
    GAMBLING: plain text: "gambling"
    
    Code (markup):
    Yes, the key is to identify pages within certain categories and not have too many false hits. The challenge with vinabet.net is to identify it as a directory, and not a gambling or "search portal"/parked page junk.

    It can never be 100% but we have identified (and removed) about 200 parked pages from our directory. Of the sites identified a couple where legit sites, so the tool is not designed to tell you what sights to remove from your directory, but rather help you identify sites in your directory that should be REreviewed.

    While the logic is being worked on we are accepting urls one by one on the web and select lists from people via email. When it is more refined we'll have a bulk submission form.

    Next step is to add a feedback form on the results page so help see sites we missed and false hits.

    Anyway thanks for the feedback.
     
    George55, Mar 27, 2007 IP
  6. SearchBuster

    SearchBuster Peon

    Messages:
    467
    Likes Received:
    42
    Best Answers:
    0
    Trophy Points:
    0
    #6
    Nice work, will be following this project closely.
     
    SearchBuster, Mar 27, 2007 IP
  7. MeetHere

    MeetHere Prominent Member

    Messages:
    15,399
    Likes Received:
    994
    Best Answers:
    0
    Trophy Points:
    330
    #7
    Tool looks good..

    I have a casino link and tool says to check it -- Great feature :)
     
    MeetHere, Mar 27, 2007 IP
  8. George55

    George55 Peon

    Messages:
    63
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    0
    #8
    Who would actually he interesting in using this tool to clean up their directories?
     
    George55, May 6, 2007 IP
  9. Obelia

    Obelia Notable Member

    Messages:
    2,083
    Likes Received:
    171
    Best Answers:
    0
    Trophy Points:
    210
    #9
    Well, I've got 3 I want to use it on.

    Also, dead links don't just appear on directories: this could also be useful for checking out smaller links pages which you find on a lot of sites. Directories may be the big market for this, but you might want to consider promoting this to people who build and maintain long links pages.
     
    Obelia, May 9, 2007 IP
  10. George55

    George55 Peon

    Messages:
    63
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    0
    #10
    In our own use and testing of this, it has functioned most effectively as a parked page detector. Those are easy to pick out.

    Also to be an effective tool, we belive it needs to keep a history of results so they aren't duplicated.

    Ie; you submit your url list. you receive back a list of 10 sites to check manually. One month later you submit your url list again. The returned list should not include any of the 10 from the previous check.

    This is important becuase you manually check the 10 and delete 8 from your directory, but decide two were false hits. You don't want to keep seeing those false hits everytime you check your urls.

    Not particularly complicated.

    Additionally, the interaction needs to be web based and not email based. Spam filters have interfered signficantly with our email testing.
     
    George55, May 9, 2007 IP
  11. trichnosis

    trichnosis Prominent Member

    Messages:
    13,785
    Likes Received:
    333
    Best Answers:
    0
    Trophy Points:
    300
    #11
    it's an intresting to have a general information about a web site
     
    trichnosis, May 9, 2007 IP
  12. shzor

    shzor Banned

    Messages:
    157
    Likes Received:
    8
    Best Answers:
    0
    Trophy Points:
    0
    #12
    interesting, would be very useful after few improments
     
    shzor, May 9, 2007 IP
  13. George55

    George55 Peon

    Messages:
    63
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    0
    #13
    And the improvements your look for are? The web page is just something to play with, the actual tool would work on a list of urls and return a shorter list in a useful format.
     
    George55, May 9, 2007 IP
  14. LeopardAt1

    LeopardAt1 Well-Known Member

    Messages:
    880
    Likes Received:
    126
    Best Answers:
    0
    Trophy Points:
    135
    #14
    seems cool. good job.

    If I was building something that took care of this concept (checking content for spam related words), you can just build a script that searches the entire database rather then the entire site? Any other content not generated from the database or php script is controlled by the webmaster.

    However, this script will be very useful for a recently hacked directory owner who wishes to see if any of his pages were edited. :)

    Let me know what you think?
     
    LeopardAt1, May 9, 2007 IP
  15. jmort732

    jmort732 Peon

    Messages:
    543
    Likes Received:
    62
    Best Answers:
    0
    Trophy Points:
    0
    #15
    George,
    Is this tool still being developed? I would love to use it on one of my older directories that probably has a ton of parked pages.

    Morty
     
    jmort732, Jun 1, 2007 IP
  16. malcolm1

    malcolm1 Prominent Member

    Messages:
    7,148
    Likes Received:
    758
    Best Answers:
    0
    Trophy Points:
    310
    #16
    Hmmm yes thats definatly a nice tool...

    ive run into this one which is still in beta but may turn out to be very useful..

    http://www.seodigger.com

    thx
    malcolm
     
    malcolm1, Jun 1, 2007 IP
  17. George55

    George55 Peon

    Messages:
    63
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    0
    #17
    Mort... pm me about emailing a url list and getting the results back by email... the plan is to make it more automated than that.... but for now manually works.
     
    George55, Jun 1, 2007 IP
  18. Fastian

    Fastian Peon

    Messages:
    2,085
    Likes Received:
    235
    Best Answers:
    0
    Trophy Points:
    0
    #18
    Indeed Interesting tool.

    When it says "scripts: 2" what its referring to ??
     
    Fastian, Jun 2, 2007 IP