Googlebot 2.0 - Now It Looks at CSS and Javascript

chulium Well-Known Member

Messages:: 1,438

Likes Received:: 70

Best Answers:: 0

Trophy Points:: 140

#1

Source: http://www.adsensebits.com/node/24

Up until now the bot (or "crawler") that Google used to index sites was based on a lynx browser. Which is to say it "saw" text. Previously the only way to tell if information was important, was if we told Google it was by using various forms of markup and a particularly labeled comment that Google would recognize as outlining important information. To get a better idea of how Google used to see your site, load up FireFox and disable images, scripts and style sheets. Start browsing.

Enter: Googlebot 2.0
But that was so 2005. 2006 sees the introduction of a newer, smarter and pickier Googlebot. One that cares about CSS and JavaScript, position and layout. Utilizing what shows up in logs as
Mozilla, Googlebot
Googlebot 2.0 is looking websites up and down in ways it couldn't have possibly done before by requesting style sheets (.css) and JavaScript's (.js).

I suppose the most obvious reason for this is that Mozilla Googlebot is taking snapshots of our websites in order to integrate the ever-so-popular but sometimes irritating thumbnail feature so many smaller search engines have implemented, or they're doing something much more Google; Searching our websites like a human being.

It seems to me that in order to maintain the ability to produce relevant ads to display, Google needs to be able to tell what information is important by finding out where it is, what it is and how it's displayed. The only way to do that is to start looking at our websites and seeing them like we do.

Whatever the reason for this new crawler, It'll be interesting to see what they do with the new information they're collecting. As the web evolves, so must Google.

chulium, Feb 23, 2006 IP

nOR and BamaStangGuy like this.

hulkster Peon

Messages:: 1,705

Likes Received:: 93

Best Answers:: 0

Trophy Points:: 0

#2

I don't do that much CSS (should do more), but if anyone has log entries showing GoogleBot spidering .css files (or .js ones) that are externally referenced from your html, that would be pretty darn solid proof they are doing this ...

hulkster, Feb 23, 2006 IP

BamaStangGuy Notable Member

Messages:: 955

Likes Received:: 51

Best Answers:: 1

Trophy Points:: 245

#3

Very cool, I hope this is the case

BamaStangGuy, Feb 23, 2006 IP

hulo Peon

Messages:: 169

Likes Received:: 5

Best Answers:: 0

Trophy Points:: 0

#4

Interesting; I will have to search around for the real SEO implications.

hulo, Feb 23, 2006 IP

dsm56 Active Member

Messages:: 863

Likes Received:: 27

Best Answers:: 0

Trophy Points:: 78

#5

Wasnt this already implemented somehow?
Everyone know google already detects hidden text...which means it must look at stylesheets too?

Anyway I think this is a good thing too...

dsm56, Feb 23, 2006 IP

LaCabra Goats R Us

Messages:: 1,954

Likes Received:: 241

Best Answers:: 0

Trophy Points:: 0

#6

sayles runs out and changes all his websites to CSS ! Good info CompuXP thanks!

LaCabra, Feb 23, 2006 IP

sketch Well-Known Member

Messages:: 898

Likes Received:: 26

Best Answers:: 0

Trophy Points:: 148

#7

Is this for AdSense use or for Google-use in general? If the latter, what's the point of this? If Google is going to start putting more emphasis on layouts and back-end code then a lot of the poorly-produced but great content websites will suffer because of this.

If it IS strictly for AdSense use, it'll be interesting to see how this works, especially since sometimes CSS is relative and not absolute. What AS might think is the bottom of the page may really be the top, etc.

Personally I think Googlebot should just keep its nose out of anything back-end and stick to content. People search for content, not XHTML 1.0 compliant websites. If any back-end snooping should occur it should be trying to figure out how to pull text content from Flash files.

sketch, Feb 23, 2006 IP

Olney Berserker

Messages:: 234

Likes Received:: 6

Best Answers:: 0

Trophy Points:: 0

#8

It just makes sense that the CSS is a factor in the algorithm. If we can visually see it why shouldn't the bots have the technology to analyze it...

Olney, Feb 23, 2006 IP

dvduval Notable Member

Messages:: 3,372

Likes Received:: 356

Best Answers:: 1

Trophy Points:: 260

#9

I like the idea that Google might be indexing javascript links. I am still running some applications such as Topics Anywhere on phpBB that I always wished Google would follow

dvduval, Feb 23, 2006 IP

skattabrain Peon

Messages:: 628

Likes Received:: 18

Best Answers:: 0

Trophy Points:: 0

#10

google looking at an understanding css & js with bots is an inevitable evolution of the search engine.

those of you who know css & js know how easy it is to hide text ... make what appears to be lowly "footer" text the biggest thing you see on a page though a browser.

i think it would be necassary for google to "look with eyes" on the code to see the difference between what's most important in the html and what's most important on the screen.

skattabrain, Feb 23, 2006 IP

hulkster Peon

Messages:: 1,705

Likes Received:: 93

Best Answers:: 0

Trophy Points:: 0

#11

I don't disagree this is a natural evolution and will eventually comes ... but does anyone have any web log data showing Googlebot spidering external .css or .js files?

In order to see how the page renders, it HAS to suck those files in ...

hulkster, Feb 23, 2006 IP

sketch Well-Known Member

Messages:: 898

Likes Received:: 26

Best Answers:: 0

Trophy Points:: 148

#12

Of course CSS can be used to hide things, but I think my point about some CSS being relative still stands. Even if the bot somehow assembles the page in it's entirety with HTML, CSS and JS files rendered, will it look RIGHT?

Anyone who deals with a lot of web design knows that Internet Explorer has horrible CSS compliance, so if we want the most browser-compatibility, in essence we have to make 2 CSS files and use javascript to spit out the right one. If for whatever reason we have to start making a third CSS file to account for Google, either a lot of websites will forget it or a lot of coders will be frustrated.

My concern is how exact, how elaborate Googlebot will be when handling CSS. I've seen some sites where the CSS classes/styles go 5, 6, 7 levels deep. Example: Googlebot may look at that 7th level's white text and the page's white background and accidentally think it's hidden text because it missed the 4th level's blue background.

sketch, Feb 23, 2006 IP

mightyb Banned

Messages:: 6,566

Likes Received:: 405

Best Answers:: 0

Trophy Points:: 0

#13

Next step for them is to start reading flash as well.

mightyb, Feb 23, 2006 IP

skattabrain Peon

Messages:: 628

Likes Received:: 18

Best Answers:: 0

Trophy Points:: 0

#14

sketch said:

Anyone who deals with a lot of web design knows that Internet Explorer has horrible CSS compliance, so if we want the most browser-compatibility, in essence we have to make 2 CSS files and use javascript to spit out the right one. If for whatever reason we have to start making a third CSS file to account for Google, either a lot of websites will forget it or a lot of coders will be frustrated.
Click to expand...

i disagree that google needs to be "perfect" in it's understanding of your css to get the big picture ... but with all this google browser talk, how hard would it be for them?

i'm simply stating a point that if google hasn't already been doing this, it's goign to in my opinion.

assuming they want to see your css ... i can tell you they'll want to see it "as is" ... making a style sheet for an engine i think would be seen as a form of cloaking.

skattabrain, Feb 23, 2006 IP

Netizen Peon

Messages:: 148

Likes Received:: 6

Best Answers:: 0

Trophy Points:: 0

#15

I just filtered my logs to show only Googlebot/2.1 for activity since November 1, 2005. Activity shows nearly 5000 hits including over 200 to robots.txt, but zero hits to the css. I don't have any js. The css file is not excluded in robots.txt.

For my main site, Googlebot/2.1 does not read css.

Netizen, Feb 23, 2006 IP

Netizen Peon

Messages:: 148

Likes Received:: 6

Best Answers:: 0

Trophy Points:: 0

#16

After running additional filters, none of the major bots are reading the css. These bots showed up reading the css:

hl_ftien_spider
Archive.org
crawler.de
muncher
Thumbnail.CZ robot 1.1 (http://thumbnail.cz/why-no-robots-txt.html)

Netizen, Feb 23, 2006 IP

kika Peon

Messages:: 13

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#17

mightyb said:

Next step for them is to start reading flash as well.
Click to expand...

google reads flash already !

kika, Feb 23, 2006 IP

mad4 Peon

Messages:: 6,986

Likes Received:: 493

Best Answers:: 0

Trophy Points:: 0

#18

Don't forget that its very easy to change your user agent.

Google could be visiting all these css and js files using the user agent of MSIE6 or something and we would never know.

mad4, Feb 24, 2006 IP

jimkarter Notable Member

Messages:: 5,168

Likes Received:: 347

Best Answers:: 0

Trophy Points:: 235

#19

mad4 said:

Don't forget that its very easy to change your user agent.

Google could be visiting all these css and js files using the user agent of MSIE6 or something and we would never know.
Click to expand...

May be true but very less posibility.

I checked my log right now from last 2 days and did not find googlebot accessing any .css or .js files. May be some more time before its a go.

jimkarter, Feb 24, 2006 IP

eKstreme Guest

Messages:: 131

Likes Received:: 14

Best Answers:: 0

Trophy Points:: 0

#20

What evidence do you have that Google reads both CSS and JS files? I've been running tests on this since last October/November, and NOT ONCE has a Google IP address (let alone one identifying itself as Mozilla Googlebot) requested external CSS files nor JS files.

On top of that, I tested embedded JS using document.write to write links into the page to see if Google et al picked them up. Once, Google did pick up a full link (http://example.com/secret-test) but not when having something like
document.write('http://example.com/' + 'test');
Code (markup):
I've read elsewhere that Googlebot seems to rip out anything that looks like a link from the whole page (whether in JS or not), and my findings are consistent with this.

I would love to hear what tests you've done!

eKstreme, Feb 24, 2006 IP

Log in or Sign up

Googlebot 2.0 - Now It Looks at CSS and Javascript

chulium Well-Known Member

hulkster Peon

BamaStangGuy Notable Member

hulo Peon

dsm56 Active Member

LaCabra Goats R Us

sketch Well-Known Member

Olney Berserker

dvduval Notable Member

skattabrain Peon

hulkster Peon

sketch Well-Known Member

mightyb Banned

skattabrain Peon

Netizen Peon

Netizen Peon

kika Peon

mad4 Peon

jimkarter Notable Member

eKstreme Guest

Useful Searches