Googlebot Not Obeying Robots.txt

xml Peon

Messages:: 254

Likes Received:: 2

Best Answers:: 0

Trophy Points:: 0

#1

Recently I have finished work on XMLTraining.com and is being spidered well by Google. Googlebot has been obeying all in the robots.txt file except a JavaScript source file.

I have coded my own textual advertising system which matches search queries to advertisements:

<script src="/thirdparty/?q=xml&x=87436349"></script>

Google however is indexing these JS files, the following code is in my robots.txt file:

User-agent: *
Disallow: /thirdparty/

I have NO idea why. But I'm thinking that maybe because JS uses SRC not HREF that SRC attributes are not checked against the robots file?

xml, May 23, 2004 IP

nohaber Well-Known Member

Messages:: 276

Likes Received:: 18

Best Answers:: 0

Trophy Points:: 138

#2

google uses cached copies of robots.txt and refreshes them from time to time. It might still use an old copy of your robots.txt. Check out their FAQ section.

nohaber, May 23, 2004 IP

xml Peon

Messages:: 254

Likes Received:: 2

Best Answers:: 0

Trophy Points:: 0

#3

The robots.txt has been unedit all the time online.

xml, May 23, 2004 IP

Voyager Guest

Messages:: 46

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#4

Could it be perhaps that the src directive is server side, and is being executed by your web server without Googlebot being aware of it?

This is the behavior with the include directive. You can ban Googlebot from seeing your includes, but it does no good, because the files are included by the web server before Googlebot knows what is going on.

Voyager, May 23, 2004 IP

Owlcroft Peon

Messages:: 645

Likes Received:: 34

Best Answers:: 0

Trophy Points:: 0

#5

There seems to be a strong current of feeling that JavaScript links will eventually all be visible to Google. A simple alternative methodology that ought to work forever is to use a php redirection script and forbid *that* to Google (or any robot).

Since the actual link is to the referrer, the robot can be stopped (assuming it is one that honors robots.txt files).

You can find a working example available for free download at http://seo-toys.com (the "Via" toy).

Owlcroft, May 24, 2004 IP

disgust Guest

Messages:: 2,417

Likes Received:: 133

Best Answers:: 0

Trophy Points:: 0

#6

some people have claimed that google already crawls through JS links, but I don't think this is the case.

we had some pages up for almost a year and a half that ONLY used JS links. the main/gate page was a PR4. none of the links inside it were cached.

disgust, May 24, 2004 IP

mxlabs Peon

Messages:: 327

Likes Received:: 6

Best Answers:: 0

Trophy Points:: 0

#7

It's more likely that somebody just changed his useragent to go fishing for cloaked pages or something... Especially since you said "except a JavaScript source file". And I still dont believe that googlebot already spiders JS links.

mxlabs, May 28, 2004 IP

xml Peon

Messages:: 254

Likes Received:: 2

Best Answers:: 0

Trophy Points:: 0

#8

mxlabs, i don't get what your saying.

The javascript source is in the google index, asin you can search for it, if blocked in robots.txt as it is, it should not appear in the index.

xml, May 29, 2004 IP

mxlabs Peon

Messages:: 327

Likes Received:: 6

Best Answers:: 0

Trophy Points:: 0

#9

Oh, so google is actually indexing the JS itself? I didn't get that part.

In that case I guess it might be because of the SRC instead of HREF as you already mentioned. I'm quite astonished that googlebot can read those JS parts.

mxlabs, May 29, 2004 IP

North Carolina SEO Well-Known Member

Messages:: 1,327

Likes Received:: 44

Best Answers:: 0

Trophy Points:: 105

#10

This surprises me as well. I've had JS links on sites and have never seen them indexed on Google as yet. This is well worth noting...

North Carolina SEO, Jun 1, 2004 IP

digitalpoint Overlord of no one Staff

Messages:: 38,334

Likes Received:: 2,613

Best Answers:: 462

Trophy Points:: 710

Digital Goods:: 29

#11

Google is grabbing external JavaScript files with a user agent of "Googlebot/Test". There is more info on it over here.

digitalpoint, Jun 1, 2004 IP

xml Peon

Messages:: 254

Likes Received:: 2

Best Answers:: 0

Trophy Points:: 0

#12

Interesting....

Cheers!

xml, Jun 2, 2004 IP

Alahad Peon

Messages:: 10

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#13

yes ... intresting....

Alahad, Jul 31, 2009 IP

Log in or Sign up

Googlebot Not Obeying Robots.txt

xml Peon

nohaber Well-Known Member

xml Peon

Voyager Guest

Owlcroft Peon

disgust Guest

mxlabs Peon

xml Peon

mxlabs Peon

North Carolina SEO Well-Known Member

digitalpoint Overlord of no one Staff

xml Peon

Alahad Peon

Useful Searches