What are the pros and cons of allowing archive.org to spider your site, and are you blocking it? I've been blocking it on some sites and allowing it on others, but I'm still not convinced either way as to whether it's a good or a bad thing. Pros: You can refer back to it if your site crashes and you lose data. You can point to it in the case of a copyright dispute. Cons: It uses up bandwidth. Other people can refer back to it to find things you would rather forget about, such as errors, bad design, and libellous comments posted on your forums. I'd like to know how other people feel about this site.
The problem I had was it was still referring back to the CSS and image files on my servers. I thought I was keeping good track of this for archive.org purposes but after time checked to find a complete mess. Being that this can be very misleading to some I finally decided years ago that it was not worth my time and blocked it in robots.txt.
archive.org let us show the history of the domains so when ever we buy over or take over that domain we will construct over on it..
It gives a nice timeline of how a site looks over the years. I allow it to index my sites. Also, when I am purchasing domains I will usually see how the domain was used in the past to see what types of traffic it is receiving.
The benefits for domain buyers is something that hadn't occured to me before. I suppose it could be a good selling point if you can demonstrate how the site has been used, and how active it has been over time.
I have used it more than once to restore part of a site or determine the history of a site. I was recently approached by beyond(dot)com about cobranding salarymap.com and I knew the name from years ago. It looked different so I went back and found that it was a big B2B and B2G (government) service but now was a job site. I found a page a few years back where it just had a tombstone saying the site was dead and gone. So archive.org set me straight on this new site that claimed to have 11,000 partner sites. Rememer it's the Internet so whatever is said or shown on the Internet stays (forever) on the Internet.
I let it index my sites... I like looking back at things I've done. I also have nothing to hide... If someone likes to point out that I made a spelling mistake 5 years ago, that's there problem... They don't visit my site often enough for bandwidth to be an issue... ---- --- I don't think there index anymore, there in a legal thing right now over if they can index sites. (which doesn't make sence since google, yahoo, and msn index sites also... there all reading robot.txt)
i like the archive. it's nice to look back on your site and others....and in a hundred years, it will be a nice place for the grandkids to laugh at me.
When buying expired domain names, it always helps to check the archive as to what was hosted there before. Also a nice way to see a snapshot of your website through its evolution
Never accually thought about it. However I agree on the comments comment above. One easy sollution for this could be to create some rules in the CMS that when archive org identifies itselv comments are off. This way that problem is solved atleast. On the other hand, I love the idea of arhive.org. Think of government and mainstream media who "updates" and "forgets" certain artciles here and there. Thanks to archive.org we have been able to proove them different from time to time. So, if I would like to see the history on others, I must except history on me. Go Archive! Mvh, Kim Steinhaug - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Easy CMS - Et brukervennlig publiseringssystem - Easy Webshop - Nettbutikk med krefter
I block it for few of my sites. I usually keep a backup in a removable hard drive for all my work. That helps me a lot!
if you want to avoid the alexa toolbar...or are in alexa, chances are you're archived. if not, eventually it gets to you...
It might be enough just to be in Google. I don't think the Alexa toolbar is widely used enough to account for all of the indexing it does. If you want to ban it, just use robots.txt: User-agent: ia_archiver Disallow: /