Hello guys! Share your expertise on how do we stop search engine from crawling the website's pages. And in case we do not want certain pages to get indexed, what are we supposed to do? (Please comment only if you have experience in doing this).
To stop search engine crawling you just have to make a .txt file and rename is as robot.txt and inside it you have to type disallow and the site's url and upload it to your site's sever through control panel.
Yeah, correctly explain by my friend caspian. Although creating a robots.txt does not fully ensure that your page will not be crawled! There is no guarantee, you see!
Thanks for the comments. But could you please elaborate the process of doing this. For instance, i use wordpress, how do i do it and where do i upload it?
If you are using wordpress you can simply go to your dashboard then Settings then Privacy then click the "Ask search engines not to index this site." radio button option. Whether you use this or robots.txt, any web page that is available to people browsing is likely to get crawled - whether you like it or not. Search engines don't always do as you ask them! The only secure way to prevent indexing is to add protection such as username / passwords for pages you don't want crawlers to access.
Thanks for the reply! But i guess there are fair chances that crawlers won't crawl if indexing is switched off. And could you further explain the process of adding protection? How do we do it?
You can create robots.txt for your site where you don't want to crawl the SE. You can follow this rules: The robots.txt file is a basic text file with one or more records. So let's go over the basics. You will need a line for every URL prefix you want to exclude. You cannot have blank lines in a record since the blank space is used to separate multiple records. User-agent: * Disallow: /cgi-bin/ Disallow: /tmp/ Disallow: /~test/ In the example above we have told ALL robots (remember the * means all) to not crawl three directories on the site (cgi-bin, tmp, ~test). You can exclude whatever directories you wish, and it can depend on how your website is structured. If you do not specify files or folders to be excluded it is understood the bot then has permission to crawl those items. To exclude ALL bots from crawling the ENTIRE server: User-agent: * Disallow: / To allow ALL bots to crawl the ENTIRE server: User-agent: * Disallow: To exclude A SINGLE bot from crawling the ENTIRE server: User-agent: BadBot Disallow: / To allow A SINGLE bot to crawl the ENTIRE server: User-agent: Google Disallow: User-agent: * Disallow: / To exclude ALL bots from crawling the ENTIRE server except for one file: This can be tricky since there's no 'allow' directive in the robots.txt file. What you have to do is place all the files you do not want to be crawled into one folder, and then leave the file to be crawled above it. So if we placed all the files we didn't want crawled in the folder called MISC we'd write the robots.txt rule like this: User-agent: * Disallow: /MISC Or you can do each individual item like this: User-agent: * Disallow: /MISC/junk.html Disallow: /MISC/family.html Disallow: /MISC/home.html To create a Crawl Delay for the ENTIRE server: An alternative to blocking a search engine is to request their robots to not crawl through your site as quickly as they normally would. This is known as a crawl delay. It's not an official extension to the robots.txt standard but one that most popular search engines use. This is an example of how to specify that robots crawling your site can only make one request every 12 seconds: User-agent: * Crawl-delay: 12
i don't know what is the method to stop a crawling but one thing is important search engine crawler search the result with link,,so remember not create the back link , i think this is the methods.
Besides using the robots.txt file (although this is an easy work around) you can put the robots="noindex" meta tag in the pages you don't want to get indexed. This meta tags may be obeyed by more search engines compared to the robots.txt file.
1. Use a robots.txt robots exclusion file 2. Use “noindex†page meta tags 3. Password protect sensitive content 4. Nofollow: tell search engines not to spider some or all links on a page 5. Don’t link to pages you want to keep out of search engines 6. Use X-Robots-Tag in your http headers