Advertising
y u no do it?

Advertising (learn more)
Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

Starts at just $1 per CPM or $0.10 per CPC.

How to advoid web folders and files being crawled by google?

Discussion in 'Search Engine Optimization' started by learning_seo, Jun 24, 2010.

learning_seo Peon

Messages:

16

Likes Received:

0

Best Answers:

0

Trophy Points:

0

#1

Hello,

I do not want google spiders to crawl the specific directory, subdirectory or files of my website. Can you please tell how it can be done and where it should be done. Please explain in detail.

Thanks in advance.

Regards

learning_seo, Jun 24, 2010 IP
HansonBro Peon

Messages:

59

Likes Received:

0

Best Answers:

0

Trophy Points:

0

#2

Create a robots.txt file and upload it on the root of your server, use disallow commands to instruct the bots to stay away from your specified folders and files. There is no point re-inventing the wheel here when there is an excellent resource on this, go to http://www.robotstxt.org/robotstxt.html. Good luck!

HansonBro, Jun 24, 2010 IP
Lemints Peon

Messages:

29

Likes Received:

0

Best Answers:

0

Trophy Points:

0

#3

yes, the robots.txt file should be in your root folder.

HansonBro said: ↑

Create a robots.txt file and upload it on the root of your server, use disallow commands to instruct the bots to stay away from your specified folders and files. There is no point re-inventing the wheel here when there is an excellent resource on this, go to http://www.robotstxt.org/robotstxt.html. Good luck!
Click to expand...

Lemints, Jun 24, 2010 IP
shakingspear Peon

Messages:

193

Likes Received:

1

Best Answers:

0

Trophy Points:

0

#4

Perfect! I always seem to forget about the robots.txt file when I create a website. Bookmarking now. Thanks!

shakingspear, Jun 24, 2010 IP
learning_seo Peon

Messages:

16

Likes Received:

0

Best Answers:

0

Trophy Points:

0

#5

HansonBro said: ↑

Create a robots.txt file and upload it on the root of your server, use disallow commands to instruct the bots to stay away from your specified folders and files. There is no point re-inventing the wheel here when there is an excellent resource on this, go to http://www.robotstxt.org/robotstxt.html. Good luck!
Click to expand...

Hello,

This url http://www.robotstxt.org/robotstxt.html is not opening.

learning_seo, Jun 29, 2010 IP
alex06291 Peon

Messages:

229

Likes Received:

0

Best Answers:

0

Trophy Points:

0

#6

It should be like this http://www.yourdomain.org/robotstxt.html then check it.

alex06291, Jun 29, 2010 IP
ericgray83 Peon

Messages:

16

Likes Received:

0

Best Answers:

0

Trophy Points:

0

#7

just use robot.txt. ask google about it

ericgray83, Jun 29, 2010 IP
social-media Member

Messages:

311

Likes Received:

9

Best Answers:

0

Trophy Points:

35

#8

Robots.txt CAN be used to prevent certain directories, sub-directories and files from being crawled but it does NOT guarantee that Google will not show those pages in their SERPs. If those pages have inbound links to them from other sites, Google can STILL show them in the SERPs even without crawling them. They can infer from the link text of the inbound links whether that page might be relevant to a particular search query. Robots.txt also will NOT cause Google to remove those blocked/disallowed pages from their index if they are already indexed. You'll need to use the URL removal tool in Google's Webmaster Tools to remove them AFTER you have the robots.txt disallows in place.

If you want to guarantee that the pages will never be shown in the SERPs then you should use a <meta name="robots" content="noindex"> element in the <head> of the pages you don't want to show up. This will not only keep them from showing the page in the SERPs, but if the pages are already in their index, it will cause them to remove them from their index.

Learn more about how to prevent Google indexing.

social-media, Jun 29, 2010 IP
liela Peon

Messages:

1

Likes Received:

0

Best Answers:

0

Trophy Points:

0

#9

Hey Hi,

follow the same things which HANSBRO said. except robot.txt no things can help you out.

liela, Jun 29, 2010 IP
AirForce1 Peon

Messages:

1,325

Likes Received:

13

Best Answers:

0

Trophy Points:

0

#10

learning_seo said: ↑

Hello,

I do not want google spiders to crawl the specific directory, subdirectory or files of my website. Can you please tell how it can be done and where it should be done. Please explain in detail.

Thanks in advance.

Regards
Click to expand...

1. Using Disallow: in your robots.txt and putting it under your site root directory.
2. Setting your noindex, nofollow meta tags in your page files.

Have a nice day,

AirForce1, Jun 29, 2010 IP
openxcell.webdevelopement Peon

Messages:

151

Likes Received:

0

Best Answers:

0

Trophy Points:

0

#11

I have one doubt if you people can help me would be very thank full. I add a link a dynamic link which I saw in google when I use site: to check my link. I found some dynamic link which is no more exist in my site. I tried through two ways that is included in robot.txt and requested for removal in webmaster tool. But I found and error in webmaster tool that the link is denied to remove.

openxcell.webdevelopement, Jun 29, 2010 IP
Rituja Peon

Messages:

539

Likes Received:

4

Best Answers:

0

Trophy Points:

0

#12

In webmaster tool & Go to setting there is having option, that you want...

Rituja, Jun 29, 2010 IP
xprtwalk Peon

Messages:

663

Likes Received:

1

Best Answers:

0

Trophy Points:

0

#13

Dear member,

It's easy to avoid your folder, pages and files by using the robots.txt file in your root, just define in the user agent, which pages you don't want to crawl by the search engines, in the user agent module make that pages disallow and that would not be crawled by the search engines.

Like as:- (for example)

User-Agent: *
Disallow: /*_V
Disallow: /*barpID
Disallow: /resources2.do
Disallow: /resources1.do
Disallow: /*&pID
Disallow: /*Cause
Disallow: /*shop.do?cID=1962
Disallow: /*shop.do?cID=1966

Than you will be able to avoid from being crawled.

xprtwalk, Jun 30, 2010 IP
subburajacmic Peon

Messages:

162

Likes Received:

0

Best Answers:

0

Trophy Points:

0

#14

Better to use robots.txt for avoid the page indexing.

subburajacmic, Jun 30, 2010 IP
jacksonbleu Guest

Messages:

1

Likes Received:

0

Best Answers:

0

Trophy Points:

0

#15

I have a web site setup with 'robots.txt' file in use. My only ERROR pages come from PHP files on my site. How do I setup the robot txt file to 'exclude' all my php files without having to list EACH and EVERY page with the disallow code?

jacksonbleu, Jul 1, 2010 IP
HansonBro Peon

Messages:

59

Likes Received:

0

Best Answers:

0

Trophy Points:

0

#16

jacksonbleu said: ↑

I have a web site setup with 'robots.txt' file in use. My only ERROR pages come from PHP files on my site. How do I setup the robot txt file to 'exclude' all my php files without having to list EACH and EVERY page with the disallow code?
Click to expand...

you might be able to do this with * wildcards in your disallow commands, check this thread on wmw http://www.webmasterworld.com/forum93/622.htm, it might point you in the right direction.

HansonBro, Jul 2, 2010 IP
xprtwalk Peon

Messages:

663

Likes Received:

1

Best Answers:

0

Trophy Points:

0

#17

HansonBro said: ↑

you might be able to do this with * wildcards in your disallow commands, check this thread on wmw http://www.webmasterworld.com/forum93/622.htm, it might point you in the right direction.
Click to expand...

Ya this is the way to avoid your problem, you are facing use the wildcard sign * in your disallow command and for any particular pages or whole I have told you already.

xprtwalk, Jul 2, 2010 IP
joshvelco Peon

Messages:

819

Likes Received:

8

Best Answers:

0

Trophy Points:

0

#18

User-Agent: *
Disallow: /the folder/file you want blocked
Disallow: /the 2nd file/folder you want blocked
Compile this into a robots.txt file placed at the root of your site, in this format.

joshvelco, Jul 2, 2010 IP

(You must log in or sign up to reply here.)

Log in or Sign up

Advertising (learn more)

How to advoid web folders and files being crawled by google?

learning_seo Peon

HansonBro Peon

Lemints Peon

shakingspear Peon

learning_seo Peon

alex06291 Peon

ericgray83 Peon

social-media Member

liela Peon

AirForce1 Peon

openxcell.webdevelopement Peon

Rituja Peon

xprtwalk Peon

subburajacmic Peon

jacksonbleu Guest

HansonBro Peon

xprtwalk Peon

joshvelco Peon

Log in or Sign up

Advertising (learn more)

How to advoid web folders and files being crawled by google?

learning_seo Peon

HansonBro Peon

Lemints Peon

shakingspear Peon

learning_seo Peon

alex06291 Peon

ericgray83 Peon

social-media Member

liela Peon

AirForce1 Peon

openxcell.webdevelopement Peon

Rituja Peon

xprtwalk Peon

subburajacmic Peon

jacksonbleu Guest

HansonBro Peon

xprtwalk Peon

joshvelco Peon

Useful Searches