How to stop search engine from crawling?

Discussion in 'Search Engine Optimization' started by seotrafficsearch, May 14, 2012.

  1. #1
    Hello guys!

    Share your expertise on how do we stop search engine from crawling the website's pages. And in case we do not want certain pages to get indexed, what are we supposed to do? (Please comment only if you have experience in doing this).
     
    seotrafficsearch, May 14, 2012 IP
  2. p.caspian

    p.caspian Peon

    Messages:
    964
    Likes Received:
    6
    Best Answers:
    1
    Trophy Points:
    0
    #2
    To stop search engine crawling you just have to make a .txt file and rename is as robot.txt and inside it you have to type disallow and the site's url and upload it to your site's sever through control panel.
     
    p.caspian, May 14, 2012 IP
  3. seo-first

    seo-first Member

    Messages:
    309
    Likes Received:
    5
    Best Answers:
    0
    Trophy Points:
    33
    #3
    Yeah, correctly explain by my friend caspian. Although creating a robots.txt does not fully ensure that your page will not be crawled!
    There is no guarantee, you see! :(
     
    seo-first, May 14, 2012 IP
  4. seotrafficsearch

    seotrafficsearch Peon

    Messages:
    194
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #4
    Thanks for the comments. But could you please elaborate the process of doing this. For instance, i use wordpress, how do i do it and where do i upload it?
     
    seotrafficsearch, May 14, 2012 IP
  5. seafrontsteve

    seafrontsteve Peon

    Messages:
    451
    Likes Received:
    11
    Best Answers:
    0
    Trophy Points:
    0
    #5
    If you are using wordpress you can simply go to your dashboard then Settings then Privacy then click the "Ask search engines not to index this site." radio button option.

    Whether you use this or robots.txt, any web page that is available to people browsing is likely to get crawled - whether you like it or not.
    Search engines don't always do as you ask them!
    The only secure way to prevent indexing is to add protection such as username / passwords for pages you don't want crawlers to access.
     
    seafrontsteve, May 14, 2012 IP
  6. seotrafficsearch

    seotrafficsearch Peon

    Messages:
    194
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #6
    Thanks for the reply! But i guess there are fair chances that crawlers won't crawl if indexing is switched off. And could you further explain the process of adding protection? How do we do it?
     
    seotrafficsearch, May 22, 2012 IP
  7. AllenRobinson

    AllenRobinson Guest

    Messages:
    149
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    0
    #7
    You can create robots.txt for your site where you don't want to crawl the SE.

    You can follow this rules:

    The robots.txt file is a basic text file with one or more records. So let's go over the basics. You will need a line for every URL prefix you want to exclude. You cannot have blank lines in a record since the blank space is used to separate multiple records.

    User-agent: *
    Disallow: /cgi-bin/
    Disallow: /tmp/
    Disallow: /~test/


    In the example above we have told ALL robots (remember the * means all) to not crawl three directories on the site (cgi-bin, tmp, ~test). You can exclude whatever directories you wish, and it can depend on how your website is structured. If you do not specify files or folders to be excluded it is understood the bot then has permission to crawl those items.


    To exclude ALL bots from crawling the ENTIRE server:
    User-agent: *
    Disallow: /
    To allow ALL bots to crawl the ENTIRE server:
    User-agent: *
    Disallow:


    To exclude A SINGLE bot from crawling the ENTIRE server:
    User-agent: BadBot
    Disallow: /


    To allow A SINGLE bot to crawl the ENTIRE server:
    User-agent: Google
    Disallow:

    User-agent: *
    Disallow: /


    To exclude ALL bots from crawling the ENTIRE server except for one file:
    This can be tricky since there's no 'allow' directive in the robots.txt file. What you have to do is place all the files you do not want to be crawled into one folder, and then leave the file to be crawled above it. So if we placed all the files we didn't want crawled in the folder called MISC we'd write the robots.txt rule like this:
    User-agent: *
    Disallow: /MISC
    Or you can do each individual item like this:
    User-agent: *
    Disallow: /MISC/junk.html
    Disallow: /MISC/family.html
    Disallow: /MISC/home.html


    To create a Crawl Delay for the ENTIRE server:
    An alternative to blocking a search engine is to request their robots to not crawl through your site as quickly as they normally would. This is known as a crawl delay. It's not an official extension to the robots.txt standard but one that most popular search engines use. This is an example of how to specify that robots crawling your site can only make one request every 12 seconds:
    User-agent: *
    Crawl-delay: 12
     
    AllenRobinson, May 22, 2012 IP
  8. ashleyjohn2347

    ashleyjohn2347 Peon

    Messages:
    269
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #8
    Does it gives the surety that our page would not be crawled any more?
     
    ashleyjohn2347, May 22, 2012 IP
  9. mangomedia

    mangomedia Peon

    Messages:
    167
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #9
    just write robots.txt file as disallow .
     
    mangomedia, May 22, 2012 IP
  10. josep88

    josep88 Greenhorn

    Messages:
    80
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    16
    #10
    i don't know what is the method to stop a crawling but one thing is important
    search engine crawler search the result with link,,so remember not create the
    back link , i think this is the methods.
     
    josep88, May 22, 2012 IP
  11. gaurav.solanki

    gaurav.solanki Peon

    Messages:
    277
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #11
    Robot txt is the method to stop crawlers to crawl your site. However there is no guarantee.
     
    gaurav.solanki, May 22, 2012 IP
  12. webdev007

    webdev007 Active Member

    Messages:
    1,037
    Likes Received:
    13
    Best Answers:
    3
    Trophy Points:
    88
    #12
    Besides using the robots.txt file (although this is an easy work around) you can put the robots="noindex" meta tag in the pages you don't want to get indexed. This meta tags may be obeyed by more search engines compared to the robots.txt file.
     
    webdev007, May 22, 2012 IP
  13. atomitservices

    atomitservices Peon

    Messages:
    145
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #13
    We can use meta robots for complete security about being indexed by search engines.
     
    atomitservices, May 22, 2012 IP
  14. riza201saly

    riza201saly Peon

    Messages:
    11
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #14
    1. Use a robots.txt robots exclusion file
    2. Use “noindex” page meta tags
    3. Password protect sensitive content
    4. Nofollow: tell search engines not to spider some or all links on a page
    5. Don’t link to pages you want to keep out of search engines
    6. Use X-Robots-Tag in your http headers
     
    riza201saly, May 22, 2012 IP