1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

How to block archive.org bot with both robots.txt AND .htaccess

Discussion in 'robots.txt' started by EducationLinks, Aug 9, 2007.

  1. #1
    I have come across so many conflicting methods/codes online, and I have no idea which actually works. Can someone please post here the EXACT code to use in both robots.txt AND .htaccess?

    NOTE #1: I do not want to block Alexa, only the WayBackMachine archiver.

    NOTE #2: I want to prevent the WayBackMachine archiver from accessing one particular page of my site. I do want it to archive the rest of my pages.

    Thanks for the help!
     
    EducationLinks, Aug 9, 2007 IP
  2. evera

    evera Peon

    Messages:
    283
    Likes Received:
    8
    Best Answers:
    0
    Trophy Points:
    0
    #2
    Afaik is Alexa and the IA Archiver the same thing, so you can't block one without blocking the other.
    But if you want to block certain parts of your web just write something like this:
    User-agent: ia_archiver
    Disallow: /Folder/
     
    evera, Aug 10, 2007 IP
  3. EducationLinks

    EducationLinks Well-Known Member

    Messages:
    1,193
    Likes Received:
    84
    Best Answers:
    0
    Trophy Points:
    140
    #3
    From what I understand, Alexa and Archive.org use different bots:

    ia_archiver = Alexa
    ia_archiver-web.archive.org = Archive.org

    Both of these bots visit my site regularly.
     
    EducationLinks, Aug 10, 2007 IP
  4. kop16

    kop16 Peon

    Messages:
    28
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #4
    User-agent: Alexa
    User-agent: Archive.org
    Disallow: /Folder/
     
    kop16, Aug 14, 2007 IP
  5. trichnosis

    trichnosis Prominent Member

    Messages:
    13,785
    Likes Received:
    333
    Best Answers:
    0
    Trophy Points:
    300
    #5
    i have added

    User-agent: ia_archiver
    Disallow: /

    to my robots.txt file. it only eats my traffic.
     
    trichnosis, Aug 21, 2007 IP
  6. gasyoun

    gasyoun Active Member

    Messages:
    760
    Likes Received:
    6
    Best Answers:
    0
    Trophy Points:
    60
    #6
    The same problem. Do not want to show a particular <img> on every page of my site to WayBackMachine archiver
     
    gasyoun, Aug 30, 2007 IP
  7. hans

    hans Well-Known Member

    Messages:
    2,923
    Likes Received:
    126
    Best Answers:
    1
    Trophy Points:
    173
    #7
    to have a good archive.org history of your site may one day add to the reputation and credibility of your site

    in addition
    may be one day yoiu find the only backup of a deleted page in archive.org ... ( this happened to me years ago once )

    in todays broadband world the little traffic of archive.org really should never matter anymore - and if you have a huge site you also have an even larger server paid by an even larger adsense revenue

    there are OTHER resource-vampires such as
    - fake chinese traffic by the ten thousands of pageloads per months
    - hotlinked pics from myspace, hi5, space.msn, etc GB of traffic each months if hotlink-unprotected pics
     
    hans, Oct 21, 2007 IP