1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

HTTrack 3.0x webcopier, how to block it....

Discussion in 'Apache' started by Redleg, Nov 28, 2004.

  1. #1
    I've just checked my server logs for this month, and I got an (unpleasant) surprise when I got to the the 23. nov..

    40-50,000 pages extra and 1gb more than usual, but no more visitors..

    So opened my raw server logs and found this one (a lot) in there:
    "Mozilla/4.5 (compatible; HTTrack 3.0x; Windows 98)

    Looks like some b***ard have been using this http://www.httrack.com/?pat to download my entire site and forum.. :mad:

    How do I prevent this from happening again?? .htaccess ??

    And I've also noticed that my traffic has gone down 30%-40% the last few days.
    But I can't see any disturbing changes in the SERPs, at least not for the keywords I monitor..

    Is there any way I can find out if is because of the copying (duplicate pages in google??)
    SEMrush
    Thanks.. :eek:
     
    Redleg, Nov 28, 2004 IP
    SEMrush
  2. Smyrl

    Smyrl Tomato Republic Staff

    Messages:
    13,439
    Likes Received:
    1,499
    Best Answers:
    76
    Trophy Points:
    510
    #2
    Try copyscrape to see if someone has put your site online.

    Shannon
     
    Smyrl, Nov 28, 2004 IP
  3. Redleg

    Redleg Raider

    Messages:
    360
    Likes Received:
    5
    Best Answers:
    0
    Trophy Points:
    0
    #3
    I've tried copyscape, but it will be impossible for me to check all of my pages there, especially the forum..

    I found nothing on the index page, or some of my high ranked pages.

    But I can't understand the sudden drop in traffic at all, all of the keywords I try in google ranks like they have been for a while (or better).. :(
     
    Redleg, Nov 28, 2004 IP
  4. Smyrl

    Smyrl Tomato Republic Staff

    Messages:
    13,439
    Likes Received:
    1,499
    Best Answers:
    76
    Trophy Points:
    510
    #4
    Could holiday weekend be source of your traffic drop?

    Both my sister and I have educational oriented sites, mine geared toward student usage, and hers toward preschool teachers. Students for most part are taking holiday off and female teachers are busy preparing meals and shopping. On male end of spectrum you are competing with football. I hope your traffic resumes after the holiday weekend passes.

    Shannon
     
    Smyrl, Nov 28, 2004 IP
    Redleg likes this.
  5. Redleg

    Redleg Raider

    Messages:
    360
    Likes Received:
    5
    Best Answers:
    0
    Trophy Points:
    0
    #5
    I hope it's that..

    Didn't really know that it was a holiday weekend, I live in Norway myself..
    Thanks for telling me.. :)

    How long did (do?) the holiday last??

    The number of visitors dropped from 11-13,000 pr day on average last week (15-19 nov), to 6,500-9,000 this week (22-26 nov.)



    Any suggestions to how I can block webcopiers like that in the future??
     
    Redleg, Nov 28, 2004 IP
  6. Smyrl

    Smyrl Tomato Republic Staff

    Messages:
    13,439
    Likes Received:
    1,499
    Best Answers:
    76
    Trophy Points:
    510
    #6
    Hi Redleg,

    Had I paid more attention to details I would have noticed you were from Norway.

    Here in the United States we are celebrating Thanksgiving. Thanksgiving is probably our second most important holiday. Families travel long distances if necessary to be together. I expect US traffic to pick back up Monday. Women over here will not have as much time to surf since we will be preparing for Christmas but expect "military quotes" attracks more males than females so hopefully your American browsers will be back in force.

    I too would love to know about blocking web grabbers from our sites. Hopefully someone will offer a reasonable suggestion. I do not see blocking an IP as reasonable way to stop the offense.

    Shannon
     
    Smyrl, Nov 28, 2004 IP
  7. flawebworks

    flawebworks Tech Services

    Messages:
    991
    Likes Received:
    36
    Best Answers:
    1
    Trophy Points:
    78
    #7
    If you want to block by IP: In your .htaccess file put (back up your original .htaccess as a precaution):

    deny from 211.157.8.44
    deny from 202.

    The first example blocks one ip; the second blocks a range in the 202 block.

    BLock an address like so:

    RewriteCond %{HTTP_REFERER} ^http://www.domainname.com/* [OR]

    I have an example somewhere for blocking useragent as well; looking for it.

    Here we go; found it:

    RewriteEngine On
    RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR]
    RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Custo [OR]
    RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebCapture.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Webdupe.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Pockey.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^DiscoPump.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^InternetSeer.com.* [NC,OR]
    RewriteRule .* - [F,L]

    If you use that list; be careful. You can lock yourself out of your site via http.

    To block httrack; try using:

    RewriteCond %{HTTP_USER_AGENT} ^HTTrack 3.0x.* [NC,OR]

    or

    RewriteCond %{HTTP_USER_AGENT} ^HTTrack.* [NC,OR]

    You may have to find a different variation.
     
    flawebworks, Nov 28, 2004 IP
  8. Dji-man

    Dji-man Peon

    Messages:
    185
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #8
    You can change Httrack's user-agent to whatever you want, so I think your only option is to block by IP.
     
    Dji-man, Nov 28, 2004 IP
  9. Redleg

    Redleg Raider

    Messages:
    360
    Likes Received:
    5
    Best Answers:
    0
    Trophy Points:
    0
    #9
    I don't think it's any good to block the IP either, since anyone can use a webcopier, and the guy who used HTTrack will probably not visit my site again (or in a while) anyway...

    Thanks for the info Shannon, I hope the traffic is back on track on monday again.. :)
     
    Redleg, Nov 28, 2004 IP
  10. J.D.

    J.D. Peon

    Messages:
    1,198
    Likes Received:
    64
    Best Answers:
    0
    Trophy Points:
    0
    #10
    Unless you see some malicious activity (eg. a deliberate DoS attack), there's really not much you can do about this, except throttling bandwidth per connection (say @ 100-200 KBps). You can do a massive download like this using just about anything nowadays, even IE (using offline links crawler).

    IP address blocking won't give you much unless you start blocking entire ISPs, which may affect your legit users.

    Blocking user agents is a bit better - at least average users won't be able to hit you with massive downloads like this. User agent strings may be easily changed, however, so this extra work is hardly worth the effort.

    J.D.
     
    J.D., Nov 28, 2004 IP
  11. Redleg

    Redleg Raider

    Messages:
    360
    Likes Received:
    5
    Best Answers:
    0
    Trophy Points:
    0
    #11
    I think I'll block some known useragents for now..

    Thanks for your help guys and gals.. :)
     
    Redleg, Nov 29, 2004 IP