In the last few days, I've noticed a Googlebot/Test spider spidering nothing but external JavaScript files. There have been rumors of Google trying to better understand JavaScript, and it looks like they may be in the testing phase of it: I've had requests for .js files within multiple domains from two different IP addresses: 64.68.89.156 64.68.89.191 According to ARIN, the 64.68.89.* block is not owned by Google, but considering Google owns the following class-Cs (which it uses for Googlebot): 64.68.80.* 64.68.81.* 64.68.82.* 64.68.83.* 64.68.84.* 64.68.85.* 64.68.86.* 64.68.87.* ...I think it's fairly safe to assume that it really *is* Google. Plus it's probably just a new IP block assignment that has not been updated in ARIN yet. - Shawn
Cool, thanks for that. Looks like I will finally have to implement that PHP redirect rather than JS links for affilates
The spider is requesting a robots.txt file, so you could always exclude your external JavaScript files that way. {shrug} - Shawn
... unless you <include> your .js from your .shtml files. If you do that, the <include> code executes before the robot exclusion code is checked.
No, it would still be blocked... if you have an image directory you choose to block, Google will not spider it, even though the images are "included" within a HTML file that is spiderable. - Shawn
Test it before you deploy it... <img src> isn't the same as <!--#include virtual-->. I tested non-JavaScript includes and found out that Google indeed did find them, because the include is done server-side.
Oh, I thought you were talking about a JS include like so: <script type="text/javascript" src="http://pagead2.googlesyndication.com/pagead/show_ads.js"> </script> - Shawn
That might work like your IMG SRC example, or it might work like the <include> example... I am merely recommending testing.
So would I be able to put my affiliate links into an external js file and stop google from spidering them with the robots text file? Then call them up like in your quote? Thanks
Not necessarily related to Googlebot, but a reason why you may wish to use php or js redirects is because some ad blocking software identifies affiliate type links and doesn't display them. Was just reading that Norton Internet Security appears to filter out links with redir or redirect in them and doesn't display anything. I haven't seen this, and am merely passing on some comments from another webmaster who purchased a new computer with the software preinstalled and was alarmed when the text links on his own site weren't appearing. Compar, as I mentioned in a thread a while back, I'd rather not pass PR to the merchants when they blatantly compete with me in PPC and SEO, so why give them more of a headstart.
If you ask rwhois.exodus.net, that class C is allocated to Google from the Savvis/Exodus/C&W US family. The IP blocks 64.68.{80-87}.* you mentioned are anycasted, which means your packets are routed to whatever datacenter is closest to you (network wise). However 64.68.88.0/21 is being routed only to their San Francisco office, which further supports your experimental theory.