How to stop search engine from crawling?

seotrafficsearch Peon

Messages:: 194

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#1

Hello guys!

Share your expertise on how do we stop search engine from crawling the website's pages. And in case we do not want certain pages to get indexed, what are we supposed to do? (Please comment only if you have experience in doing this).

seotrafficsearch, May 14, 2012 IP

p.caspian Peon

Messages:: 964

Likes Received:: 6

Best Answers:: 1

Trophy Points:: 0

#2

To stop search engine crawling you just have to make a .txt file and rename is as robot.txt and inside it you have to type disallow and the site's url and upload it to your site's sever through control panel.

p.caspian, May 14, 2012 IP

seo-first Member

Messages:: 309

Likes Received:: 5

Best Answers:: 0

Trophy Points:: 33

#3

Yeah, correctly explain by my friend caspian. Although creating a robots.txt does not fully ensure that your page will not be crawled!
There is no guarantee, you see!

seo-first, May 14, 2012 IP

seotrafficsearch Peon

Messages:: 194

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#4

Thanks for the comments. But could you please elaborate the process of doing this. For instance, i use wordpress, how do i do it and where do i upload it?

seotrafficsearch, May 14, 2012 IP

seafrontsteve Peon

Messages:: 451

Likes Received:: 11

Best Answers:: 0

Trophy Points:: 0

#5

If you are using wordpress you can simply go to your dashboard then Settings then Privacy then click the "Ask search engines not to index this site." radio button option.

Whether you use this or robots.txt, any web page that is available to people browsing is likely to get crawled - whether you like it or not.
Search engines don't always do as you ask them!
The only secure way to prevent indexing is to add protection such as username / passwords for pages you don't want crawlers to access.

seafrontsteve, May 14, 2012 IP

seotrafficsearch Peon

Messages:: 194

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#6

seafrontsteve said: ↑

If you are using wordpress you can simply go to your dashboard then Settings then Privacy then click the "Ask search engines not to index this site." radio button option.

Whether you use this or robots.txt, any web page that is available to people browsing is likely to get crawled - whether you like it or not.
Search engines don't always do as you ask them!
The only secure way to prevent indexing is to add protection such as username / passwords for pages you don't want crawlers to access.
Click to expand...

Thanks for the reply! But i guess there are fair chances that crawlers won't crawl if indexing is switched off. And could you further explain the process of adding protection? How do we do it?

seotrafficsearch, May 22, 2012 IP

AllenRobinson Guest

Messages:: 149

Likes Received:: 3

Best Answers:: 0

Trophy Points:: 0

#7

seotrafficsearch said: ↑

How to stop search engine from crawling?
Click to expand...

You can create robots.txt for your site where you don't want to crawl the SE.

You can follow this rules:

The robots.txt file is a basic text file with one or more records. So let's go over the basics. You will need a line for every URL prefix you want to exclude. You cannot have blank lines in a record since the blank space is used to separate multiple records.

User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /~test/

In the example above we have told ALL robots (remember the * means all) to not crawl three directories on the site (cgi-bin, tmp, ~test). You can exclude whatever directories you wish, and it can depend on how your website is structured. If you do not specify files or folders to be excluded it is understood the bot then has permission to crawl those items.

To exclude ALL bots from crawling the ENTIRE server:
User-agent: *
Disallow: /
To allow ALL bots to crawl the ENTIRE server:
User-agent: *
Disallow:

To exclude A SINGLE bot from crawling the ENTIRE server:
User-agent: BadBot
Disallow: /

To allow A SINGLE bot to crawl the ENTIRE server:
User-agent: Google
Disallow:

User-agent: *
Disallow: /

To exclude ALL bots from crawling the ENTIRE server except for one file:
This can be tricky since there's no 'allow' directive in the robots.txt file. What you have to do is place all the files you do not want to be crawled into one folder, and then leave the file to be crawled above it. So if we placed all the files we didn't want crawled in the folder called MISC we'd write the robots.txt rule like this:
User-agent: *
Disallow: /MISC
Or you can do each individual item like this:
User-agent: *
Disallow: /MISC/junk.html
Disallow: /MISC/family.html
Disallow: /MISC/home.html

To create a Crawl Delay for the ENTIRE server:
An alternative to blocking a search engine is to request their robots to not crawl through your site as quickly as they normally would. This is known as a crawl delay. It's not an official extension to the robots.txt standard but one that most popular search engines use. This is an example of how to specify that robots crawling your site can only make one request every 12 seconds:
User-agent: *
Crawl-delay: 12

AllenRobinson, May 22, 2012 IP

ashleyjohn2347 Peon

Messages:: 269

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#8

Does it gives the surety that our page would not be crawled any more?

ashleyjohn2347, May 22, 2012 IP

mangomedia Peon

Messages:: 167

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#9

just write robots.txt file as disallow .

mangomedia, May 22, 2012 IP

josep88 Greenhorn

Messages:: 80

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 16

#10

i don't know what is the method to stop a crawling but one thing is important
search engine crawler search the result with link,,so remember not create the
back link , i think this is the methods.

josep88, May 22, 2012 IP

gaurav.solanki Peon

Messages:: 277

Likes Received:: 1

Best Answers:: 0

Trophy Points:: 0

#11

Robot txt is the method to stop crawlers to crawl your site. However there is no guarantee.

gaurav.solanki, May 22, 2012 IP

webdev007 Active Member

Messages:: 1,037

Likes Received:: 13

Best Answers:: 3

Trophy Points:: 88

#12

Besides using the robots.txt file (although this is an easy work around) you can put the robots="noindex" meta tag in the pages you don't want to get indexed. This meta tags may be obeyed by more search engines compared to the robots.txt file.

webdev007, May 22, 2012 IP

atomitservices Peon

Messages:: 145

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#13

We can use meta robots for complete security about being indexed by search engines.

atomitservices, May 22, 2012 IP

riza201saly Peon

Messages:: 11

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#14

1. Use a robots.txt robots exclusion file
2. Use â€œnoindexâ€ page meta tags
3. Password protect sensitive content
4. Nofollow: tell search engines not to spider some or all links on a page
5. Donâ€™t link to pages you want to keep out of search engines
6. Use X-Robots-Tag in your http headers

riza201saly, May 22, 2012 IP

Log in or Sign up

How to stop search engine from crawling?

seotrafficsearch Peon

p.caspian Peon

seo-first Member

seotrafficsearch Peon

seafrontsteve Peon

seotrafficsearch Peon

AllenRobinson Guest

ashleyjohn2347 Peon

mangomedia Peon

josep88 Greenhorn

gaurav.solanki Peon

webdev007 Active Member

atomitservices Peon

riza201saly Peon

Useful Searches