Proxy - Mobile Phones - Myspace Proxy - Debt Consolidation - Flights

PDA

View Full Version : Googlebot/Test Spider Getting External JavaScript Files


digitalpoint
Mar 18th 2004, 12:51 pm
In the last few days, I've noticed a Googlebot/Test spider spidering nothing but external JavaScript files. There have been rumors of Google trying to better understand JavaScript, and it looks like they may be in the testing phase of it:

I've had requests for .js files within multiple domains from two different IP addresses:

64.68.89.156
64.68.89.191

According to ARIN, the 64.68.89.* block (http://ws.arin.net/cgi-bin/whois.pl?queryinput=64.68.89) is not owned by Google, but considering Google owns the following class-Cs (which it uses for Googlebot):

64.68.80.*
64.68.81.*
64.68.82.*
64.68.83.*
64.68.84.*
64.68.85.*
64.68.86.*
64.68.87.*

...I think it's fairly safe to assume that it really *is* Google. Plus it's probably just a new IP block assignment that has not been updated in ARIN yet.

- Shawn

Mr T
Mar 18th 2004, 1:04 pm
Cool, thanks for that. Looks like I will finally have to implement that PHP redirect rather than JS links for affilates :(

digitalpoint
Mar 18th 2004, 1:06 pm
The spider is requesting a robots.txt file, so you could always exclude your external JavaScript files that way. {shrug}

- Shawn

Will.Spencer
Apr 18th 2004, 10:59 am
... unless you <include> your .js from your .shtml files.

If you do that, the <include> code executes before the robot exclusion code is checked.

digitalpoint
Apr 18th 2004, 11:51 am
No, it would still be blocked... if you have an image directory you choose to block, Google will not spider it, even though the images are "included" within a HTML file that is spiderable.

- Shawn

Will.Spencer
Apr 18th 2004, 9:18 pm
Test it before you deploy it...

<img src> isn't the same as <!--#include virtual-->.

I tested non-JavaScript includes and found out that Google indeed did find them, because the include is done server-side.

digitalpoint
Apr 18th 2004, 10:30 pm
Oh, I thought you were talking about a JS include like so:

<script type="text/javascript"
src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script>

- Shawn

Will.Spencer
Apr 18th 2004, 10:49 pm
That might work like your IMG SRC example, or it might work like the <include> example... I am merely recommending testing. :-)

mobile phones uk
Apr 22nd 2004, 3:58 am
Oh, I thought you were talking about a JS include like so:

<script type="text/javascript"
src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script>

- Shawn

So would I be able to put my affiliate links into an external js file and stop google from spidering them with the robots text file?

Then call them up like in your quote?

Thanks

compar
Apr 22nd 2004, 5:11 am
Cool, thanks for that. Looks like I will finally have to implement that PHP redirect rather than JS links for affilates :(
I'm missing something here. Why do you want to hide your affiliate links from Google's bots?

jarvi
Apr 23rd 2004, 11:32 pm
Not necessarily related to Googlebot, but a reason why you may wish to use php or js redirects is because some ad blocking software identifies affiliate type links and doesn't display them. Was just reading that Norton Internet Security appears to filter out links with redir or redirect in them and doesn't display anything. I haven't seen this, and am merely passing on some comments from another webmaster who purchased a new computer with the software preinstalled and was alarmed when the text links on his own site weren't appearing.

Compar, as I mentioned in a thread a while back, I'd rather not pass PR to the merchants when they blatantly compete with me in PPC and SEO, so why give them more of a headstart.

Catfish
Jan 25th 2005, 10:55 am
why not just use the new rel command?

symetrix
Jan 26th 2005, 12:32 am
According to ARIN, the 64.68.89.* block (http://ws.arin.net/cgi-bin/whois.pl?queryinput=64.68.89) is not owned by Google, but considering Google owns the following class-Cs (which it uses for Googlebot):


If you ask rwhois.exodus.net, that class C is allocated to Google from the Savvis/Exodus/C&W US family.

The IP blocks 64.68.{80-87}.* you mentioned are anycasted, which means your packets are routed to whatever datacenter is closest to you (network wise). However 64.68.88.0/21 is being routed only to their San Francisco office, which further supports your experimental theory.

tycoonjo
Jun 13th 2006, 7:29 am
goolge don't love me

netprophet
Sep 19th 2006, 1:15 am
cool stuff ........:cool:

thanx

baybossplaya
Dec 21st 2007, 1:19 am
useful info