If you have a plan to promote your business website through search engine’s,then you should have some idea over few issue’s ie;Crawling,Indexing and Cache to maximize your SEO effort!We often get some question’s and confusion over these three preliminary search engine robot functions.Let’s make some few keynote’s on Crawling ,Indexing and Cache: Search Engine Crawling:According to Wikipedia search engine crawling definition is “A Web crawler is a computer program that browses the World Wide Web in a methodical, automated manner or in an orderly fashion. Other terms for Web crawlers are ants, automatic indexers, bots,Web spiders,Web robots, or—especially in the FOAF community—Web scutters.-This process is called Web crawling or spidering†So basically crawling is a method or program by which search engine robot fetch the entire website’s data and make a decision that which data should be process further for search engine ranking and result’s.By crawling search engine robot can fetch links,validate the HTML code,video’s,image’s or any other scripts ! Search Engine Indexing:Now we come to search engine Indexing which is a major factor for search engine results.According to Wikipedia “Search engine indexing collects, parses, and stores data to facilitate fast and accurate information retrieval. Index design incorporates interdisciplinary concepts from linguistics, cognitive psychology, mathematics, informatics, physics, and computer science. An alternate name for the process in the context of search engines designed to find web pages on the Internet is Web indexing.†Search Engine Indexing, is a method by which search engine collect’s and store important data which comes from search engine crawling,and these data used for SERP’s. For Google you can use “site†operator to see which page’s got indexed for a website like [site:http://www.example.com] and then Google it. For Yahoo, you can use:http://siteexplorer.search.yahoo.com/ and you can see the indexed page’s from Yahoo For Bing you can use the same “siteâ€operator just like Google. The good thing about Google indexing is, actually you can control that according to your need.Simply you can use NOINDEX tag on a particular page,by that Google wont index your particular web property and use it further!Eventually you can control crawling and indexing.To control crawling you can use robot.txt tag to restrict one URL from Google crawling process.If you want to remove crawled or uncrawled url from indexing then you can webmaster’s URL removal tool or take look on Matt Cutts advise on the issue: [video=youtube;KBdEwpRQRD0]http://www.youtube.com/watch?feature=player_embedded&v=KBdEwpRQRD0[/video] ache:Cache is primary function which performed by the search engine crawler,according to Wekipedia“A web cache is a mechanism for the temporary storage (caching) of web documents, such as HTML pages and images, to reduce bandwidth usage, server load, and perceived lag. A web cache stores copies of documents passing through it; subsequent requests may be satisfied from the cache if certain conditions are met.“Its basically a method by which spider take a snippet of a website and store while,or you can say it store’s on a volatile memory slot.For Google you can use “cache†operator to see latest snippet of a webpage like: Cache:http:www.example.com and Google it. Crawling ,indexing and Cache is basic methodology for search engine’s and related with each other. Published By Anirban Das,Zebra Techies,India
Hey man , This is a awesome post.This is really useful for me. I had a little bit confusion about it but this post is really helping me to solve my doubt. Thanks for this post.
hi nintynine it's really a great informative post. the tips and information are really helpful. thank's man.