1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

How to advoid web folders and files being crawled by google?

Discussion in 'Search Engine Optimization' started by learning_seo, Jun 24, 2010.

  1. #1
    Hello,

    I do not want google spiders to crawl the specific directory, subdirectory or files of my website. Can you please tell how it can be done and where it should be done. Please explain in detail.

    Thanks in advance.

    Regards
     
    learning_seo, Jun 24, 2010 IP
  2. HansonBro

    HansonBro Peon

    Messages:
    59
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #2
    Create a robots.txt file and upload it on the root of your server, use disallow commands to instruct the bots to stay away from your specified folders and files. There is no point re-inventing the wheel here when there is an excellent resource on this, go to http://www.robotstxt.org/robotstxt.html. Good luck!
     
    HansonBro, Jun 24, 2010 IP
  3. Lemints

    Lemints Peon

    Messages:
    29
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #3
    yes, the robots.txt file should be in your root folder.

     
    Lemints, Jun 24, 2010 IP
  4. shakingspear

    shakingspear Peon

    Messages:
    193
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #4
    Perfect! I always seem to forget about the robots.txt file when I create a website. Bookmarking now. Thanks!
     
    shakingspear, Jun 24, 2010 IP
  5. learning_seo

    learning_seo Peon

    Messages:
    16
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #5
    Hello,

    This url http://www.robotstxt.org/robotstxt.html is not opening.
     
    learning_seo, Jun 29, 2010 IP
  6. alex06291

    alex06291 Peon

    Messages:
    229
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #6
    alex06291, Jun 29, 2010 IP
  7. ericgray83

    ericgray83 Peon

    Messages:
    16
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #7
    just use robot.txt. ask google about it
     
    ericgray83, Jun 29, 2010 IP
  8. social-media

    social-media Member

    Messages:
    311
    Likes Received:
    9
    Best Answers:
    0
    Trophy Points:
    35
    #8
    Robots.txt CAN be used to prevent certain directories, sub-directories and files from being crawled but it does NOT guarantee that Google will not show those pages in their SERPs. If those pages have inbound links to them from other sites, Google can STILL show them in the SERPs even without crawling them. They can infer from the link text of the inbound links whether that page might be relevant to a particular search query. Robots.txt also will NOT cause Google to remove those blocked/disallowed pages from their index if they are already indexed. You'll need to use the URL removal tool in Google's Webmaster Tools to remove them AFTER you have the robots.txt disallows in place.

    If you want to guarantee that the pages will never be shown in the SERPs then you should use a <meta name="robots" content="noindex"> element in the <head> of the pages you don't want to show up. This will not only keep them from showing the page in the SERPs, but if the pages are already in their index, it will cause them to remove them from their index.

    Learn more about how to prevent Google indexing.
     
    social-media, Jun 29, 2010 IP
  9. liela

    liela Peon

    Messages:
    1
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #9
    Hey Hi,

    follow the same things which HANSBRO said. except robot.txt no things can help you out.
     
    liela, Jun 29, 2010 IP
  10. AirForce1

    AirForce1 Peon

    Messages:
    1,325
    Likes Received:
    13
    Best Answers:
    0
    Trophy Points:
    0
    #10
    1. Using Disallow: in your robots.txt and putting it under your site root directory.
    2. Setting your noindex, nofollow meta tags in your page files.

    Have a nice day,
     
    AirForce1, Jun 29, 2010 IP
  11. openxcell.webdevelopement

    openxcell.webdevelopement Peon

    Messages:
    151
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #11
    I have one doubt if you people can help me would be very thank full. I add a link a dynamic link which I saw in google when I use site: to check my link. I found some dynamic link which is no more exist in my site. I tried through two ways that is included in robot.txt and requested for removal in webmaster tool. But I found and error in webmaster tool that the link is denied to remove.
     
  12. Rituja

    Rituja Peon

    Messages:
    539
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    0
    #12
    In webmaster tool & Go to setting there is having option, that you want...
     
    Rituja, Jun 29, 2010 IP
  13. xprtwalk

    xprtwalk Peon

    Messages:
    663
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #13
    Dear member,

    It's easy to avoid your folder, pages and files by using the robots.txt file in your root, just define in the user agent, which pages you don't want to crawl by the search engines, in the user agent module make that pages disallow and that would not be crawled by the search engines.

    Like as:- (for example)

    User-Agent: *
    Disallow: /*_V
    Disallow: /*barpID
    Disallow: /resources2.do
    Disallow: /resources1.do
    Disallow: /*&pID
    Disallow: /*Cause
    Disallow: /*shop.do?cID=1962
    Disallow: /*shop.do?cID=1966

    Than you will be able to avoid from being crawled.
     
    xprtwalk, Jun 30, 2010 IP
  14. subburajacmic

    subburajacmic Peon

    Messages:
    162
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #14
    Better to use robots.txt for avoid the page indexing.
     
    subburajacmic, Jun 30, 2010 IP
  15. jacksonbleu

    jacksonbleu Guest

    Messages:
    1
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #15
    I have a web site setup with 'robots.txt' file in use. My only ERROR pages come from PHP files on my site. How do I setup the robot txt file to 'exclude' all my php files without having to list EACH and EVERY page with the disallow code?
     
    jacksonbleu, Jul 1, 2010 IP
  16. HansonBro

    HansonBro Peon

    Messages:
    59
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #16
    you might be able to do this with * wildcards in your disallow commands, check this thread on wmw http://www.webmasterworld.com/forum93/622.htm, it might point you in the right direction.
     
    HansonBro, Jul 2, 2010 IP
  17. xprtwalk

    xprtwalk Peon

    Messages:
    663
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #17
    Ya this is the way to avoid your problem, you are facing use the wildcard sign * in your disallow command and for any particular pages or whole I have told you already.
     
    xprtwalk, Jul 2, 2010 IP
  18. joshvelco

    joshvelco Peon

    Messages:
    819
    Likes Received:
    8
    Best Answers:
    0
    Trophy Points:
    0
    #18
    User-Agent: *
    Disallow: /the folder/file you want blocked
    Disallow: /the 2nd file/folder you want blocked
    Compile this into a robots.txt file placed at the root of your site, in this format.
     
    joshvelco, Jul 2, 2010 IP