Hi, Recently yahoo released a new web page spider- Slurp 3.0 This new Yahoo! Slurp 3.0 recognizes the same user-agent and all robots.txt directives for 'Yahoo! Slurp,' though it'll identify itself as Slurp 3.0 in your web logs. People will see some changes because of this new web page spider as mentioned below by Sharad Verma from Yahoo Search. "a) The crawlers will start crawling from a different and much smaller set of IP addresses, but it'll still be from the crawl.yahoo.net domain. Any reverse DNS checks to identify our crawler will continue to work. Please note that if you're using IP-based recognition of our crawlers, you might see a drop in crawl/coverage from Yahoo! We strongly recommend that you move to reverse DNS-based identification of Yahoo! Slurp if you're using any other method to avoid this problem. The current set of IPs will disappear from your web logs in the next several weeks. b) The crawlers will also publish a new user-agent, 'Yahoo! Slurp/3.0.' Existing robots.txt directives for 'Slurp' or 'Yahoo! Slurp' will continue to work, but if you have directives specific to 'Slurp/2.0,' they won't be recognized by the new crawler (though usage of the 'Slurp/2.0' user-agent is very rare on the web, so you won't likely be affected). We recommend specifying the shorter version of: User-agent: Slurp. Check out "How do I prevent my site or certain subdirectories from being crawled?" on our Help page for more details. These changes will affect the main Yahoo! Web Search crawlers. Crawlers that similarly respect the Yahoo! Slurp directive but identify themselves more specifically, such as Yahoo! Slurp China and others, will not be impacted." From- Yahoo Search blog
With the popularization of Semantic Web technologies more and more bots will be needed to make the extracted data more revelant. That's again related to latent semantic optimization model which helps optimizing your content for the last standards of search engines. By last standards I mean the semantic aspect of things, search engines that are going to understand a text within its context without the help of meta tags and such. This will be also helpful for the normal user, cause it will enhance his search results and give him more revelant ones, bombing and manipulating techniques will be over.
Most of the mess created on my site and server load is most of the time due to yahoo crawler. I am not sure why google crawler does not hav any effect on serever load where as yahoo spider has? Any advice?