Hi can anyone help me out on this. I just want to know the exact difference between crawling and indexing. I need answer in short please. Thanks, Sathish.
Search engine indexing collects, parses, and stores data to facilitate fast and accurate information retrieval. Index design incorporates interdisciplinary concepts from linguistics, cognitive psychology, mathematics, informatics, physics and computer science. An alternate name for the process in the context of search engines designed to find web pages on the Internet is Web indexing. When you hear "index" chances are the person is talking about Google's index. In order to have your page displayed on Google they store it and serve it from their index. A Web crawler is a computer program that browses the World Wide Web in a methodical, automated manner or in an orderly fashion. Other terms for Web crawlers are ants, automatic indexers, bots, or Web spiders, Web robots... This process is called Web crawling or spidering. Many sites, in particular search engines, use spidering as a means of providing up-to-date data. Web crawlers are mainly used to create a copy of all the visited pages for later processing by a search engine that will index the downloaded pages to provide fast searches. Crawlers can also be used for automating maintenance tasks on a Web site, such as checking links or validating HTML code. Also, crawlers can be used to gather specific types of information from Web pages, such as harvesting e-mail addresses (usually for spam). A Web crawler is one type of bot, or software agent. In general, it starts with a list of URLs to visit, called the seeds. As the crawler visits these URLs, it identifies all the hyperlinks in the page and adds them to the list of URLs to visit, called the crawl frontier. URLs from the frontier are recursively visited according to a set of policies.
Consider Google as an database where things are kept on a priority basis, once you create a website it's being indexed means kept ion the database of Google, that means that it's being indexed, but it doesn't means that the website is crawled. Once the site is indexed once it will be crawled by Google when it's turn comes. that means if the url is indexed we can't be sure of that it's crawled or not but if the url is crawled we can be sure that it's being indexed. it's a predefined process that Google follows
crawling is the way of reading your site.. indexing is when bots gets the information on your site, (contents of your site). So even if your exclude a site page on robot.txt from indexing SEs still going to crawl it.
Thanks, actually i was confusted with your reply; to my knowledge crawling is the first process and crawled is the second one. whic means. if the url is crawled we can't be sure of that the url is indexed or not; but if the url is indexed we can be sure that the url got crawled.
Hi, Crawling is when the spiders (little software programs) visit your site and "crawl" through it to gather information. Indexing is the process of categorizing what the spiders find on your site so that the search engines can decide the most appropriate places to list you site. Regards, Dave
Cache means - the way the Web page looked when Google’s spiders indexed it. Index means pages added on Google database. I replied same in another thread here.. hope its in in simple form as you required frnd..