My best efforts to keep the spammers out

Discussion in 'Directories' started by miko67, Jul 4, 2007.

  1. #1
    When running directories one quickly realises that there is an ongoing arms race between (some or most) directory owners and their "counterparts", the spammers - always trying for a quick buck and a cheat.

    I have been implementing new queries to my database of websites on a trial and error basis ever since I started in this business, and today I believe I have a very good way of getting rid of the unwelcome crowd.

    This led me to my latest weapon, introducing IP bans in my .htaccess file for a test period (I know most scripts have this feature, I just don't think it's something one should use eagerly and without consideration). Not wanting to scare off too many visitors, but still trying to save bandwidth capacity for the serious audience is a major balancing act - especially when working under a small budget trying to minimize costs.

    What schemes did you try out in your time? Did you ever fail miserably with any?

    In particular I'm curious as to how free, general, deep link directories keep out the never ending hoard of pharma-related (and such) sites. This can be quite a job, one that I successfully (mostly) got away with by way of creating an updating query for the database.

    Have anyone ever thought of publishing a "Directories bad-IP list"? Of cours it would need some updating and automated checking... one could imagine a handfull of "open" directories to keep fishing for the bad IP's.

    Just my two cents for now.

    /miko67
     
    miko67, Jul 4, 2007 IP
  2. The Pheonix

    The Pheonix Banned

    Messages:
    1,233
    Likes Received:
    96
    Best Answers:
    0
    Trophy Points:
    0
    #2
    Nice post. The only problem with IP restriction is that so many people use IP proxy software that you'd end up with a list of banned UP's so long you'd need a couple of servers to hold it.

    I know this is gonna sound like and advert for phpLynx and in many ways it is but its relevant as they have a 'ban word' feature even in thier free version which allows you to effectively filter out any word you like such as pharm, viagra, porn you name it this can filter it.

    They also have a three way ban feature where you can ban by: IP, domain name, email, and then combine it with thier state of the art antibot text/captcha you can make it as difficult as hell for spammers.

    In the unlikely event they get past all these features you can then force them to pay a dollar or even a cent for submission, this forces them to the paypal page and if they don't pay they don't get listed as there's an 'auto-delete' feature built in.

    As much as the spammers try, we as programmers will try and beat them, I guess the examples above are testimony to that. :)
     
    The Pheonix, Jul 4, 2007 IP
  3. CReed

    CReed Prominent Member

    Messages:
    3,969
    Likes Received:
    595
    Best Answers:
    0
    Trophy Points:
    310
    #3
    I don't think most directory owners fully realize the potential amount of spam they're likely to receive, automated or otherwise.

    For example in a 29 minute span last week there were 300 automated attempts at one of my directories. It's nice to know that the safeguards are working as they should so I never see any of it, nor am I burdened with the task of deleting this crap from my pending queue.

    A few adjustments were made recently as we did block a few legit submissions, but otherwise it works exceptionally well.

    I'm no longer banning domains or IPs after the fact; a tactic that doesn't work very well.
     
    CReed, Jul 4, 2007 IP
  4. msolution

    msolution Well-Known Member

    Messages:
    1,182
    Likes Received:
    123
    Best Answers:
    0
    Trophy Points:
    175
    #4
    IPs dont realy work, as most of the people who use dial up access, have dynamic IP allocations! ... so ur banning one user today and preventing a customer tomorrow!

    i think this will be my next mod for phpLD hang on!

    M.
     
    msolution, Jul 4, 2007 IP
  5. miko67

    miko67 Well-Known Member

    Messages:
    769
    Likes Received:
    59
    Best Answers:
    0
    Trophy Points:
    120
    #5
    I have had a bad feeling about IP banning all along, so I tend to agree with all above.

    However I did some serious querying on my databases that holds approximately 100-150 thousand websites between them. I didn't delete any for a long time (just inactivated them), I wanted to get to know the bad guys even if I didn't want them in the directory.

    What I came up with was a minimal list of 8-12 IP's where no acceptable (by my standards) URLs were ever submitted, and somewhere between 40 and 300 URLs of a non acceptable nature were submitted.

    These URLs are on my bad-IP list for a year because they have shown a consistent pattern over a long period of time (months at least). In about a year, I'm gonna loosen the grip a little and see what else needs banning physically.

    Other than that I really like the idea of saving all data and not deleting any. Gives you a lot more extensive ground for statistical datamining - albeit also much more heavy on the script I suppose.

    This reminds me, does anybody know if the main present scripts (phpLD, SiteSift and others) can hold more than a million entries? Ten million?

    somebody must have been studying Database theory in school and have a lot to say about this... but are the scripts the bottleneck and does anybody know the "stress-limits" of their script?

    /miko67
     
    miko67, Jul 4, 2007 IP
    ! Ask ! likes this.