1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

drectory submission algo

Discussion in 'Directories' started by jlawrence, Apr 21, 2005.

  1. #1
    I'm trying to come up with a feasible way of automating entries to a directory.
    Basically, I want the directory to become a useful resource (hopefully) and in order to do that it must have a lot of sites listed in it. I simply can't sit down and visit thousands of sites (hopefully one day I'll get that many submitted) manually to asses them.
    For obvious reasons, I'm not wanting to list bad neighborhood sites. So I need to come up with a way of recognising them so. I'm hoping that all sites labled bad neighborhoods will have something unique (perhaps PR = n/a), but other than perhaps not having PR I can't think of anything.
    If I can't discern what a bad neighborhood is, then I'd need to either 1) manually check every site, or 2) accept every site. For a lot of webmasters if the directory linked out to bad neighborhoods, then it would loose any of it's appeal - so they wouldn't submit their sites.

    I'm open to suggestions here - can anyone think of ways to tell if a site is bad.
     
    jlawrence, Apr 21, 2005 IP
  2. T0PS3O

    T0PS3O Feel Good PLC

    Messages:
    13,219
    Likes Received:
    777
    Best Answers:
    0
    Trophy Points:
    0
    #2
    A banned site returns no results in Google. To stay on the safe side you could turn that around and assume all sites without any results to be banned/bad.

    You could accept PR0 sites but flag them for a manual hazard review.

    The biggest problem is ill-formatted entries. You can use PHP to make everything lower-case and then title case sentences so everything is standardised. More advanced stuff can be done as well to ensure uniform formatting.
     
    T0PS3O, Apr 21, 2005 IP
  3. jlawrence

    jlawrence Peon

    Messages:
    1,368
    Likes Received:
    81
    Best Answers:
    0
    Trophy Points:
    0
    #3
    Formatting is a minor problem really, that's just a case of defining what is to be allowed and then formatting the users input to fit those rules/definitions.
    What I was hoping was:
    PR0 = new site ie site created since last update
    PR>0 = more extablished site
    PR n/a = bad neighborhood.
    Searching google is an option - using the API. Say, if the site doesn't exist in the first 500 results then it is marked for manual review.
     
    jlawrence, Apr 21, 2005 IP
  4. jlawrence

    jlawrence Peon

    Messages:
    1,368
    Likes Received:
    81
    Best Answers:
    0
    Trophy Points:
    0
    #4
    actually, thinking about it. It would be quicker to check for indexed pages from that site.
    Zero indexed pages = manual review as a suspect site.
     
    jlawrence, Apr 21, 2005 IP
  5. T0PS3O

    T0PS3O Feel Good PLC

    Messages:
    13,219
    Likes Received:
    777
    Best Answers:
    0
    Trophy Points:
    0
    #5
    That's what I meant. Either an API site: search or a normal search for their own domain. If they don;t come up for their own domain it's bound to be dodgy.
     
    T0PS3O, Apr 21, 2005 IP
  6. jlawrence

    jlawrence Peon

    Messages:
    1,368
    Likes Received:
    81
    Best Answers:
    0
    Trophy Points:
    0
    #6
    My next thought, one that just came to me over a cuppa. Working out whether they've actually submitted to the correct category might actually be the most difficult thing - initially, I'll probably have to just trust people (what a frightening thought :) ).
     
    jlawrence, Apr 21, 2005 IP