Source: http://www.adsensebits.com/node/24 Up until now the bot (or "crawler") that Google used to index sites was based on a lynx browser. Which is to say it "saw" text. Previously the only way to tell if information was important, was if we told Google it was by using various forms of markup and a particularly labeled comment that Google would recognize as outlining important information. To get a better idea of how Google used to see your site, load up FireFox and disable images, scripts and style sheets. Start browsing. Enter: Googlebot 2.0 But that was so 2005. 2006 sees the introduction of a newer, smarter and pickier Googlebot. One that cares about CSS and JavaScript, position and layout. Utilizing what shows up in logs as Mozilla, Googlebot Googlebot 2.0 is looking websites up and down in ways it couldn't have possibly done before by requesting style sheets (.css) and JavaScript's (.js). I suppose the most obvious reason for this is that Mozilla Googlebot is taking snapshots of our websites in order to integrate the ever-so-popular but sometimes irritating thumbnail feature so many smaller search engines have implemented, or they're doing something much more Google; Searching our websites like a human being. It seems to me that in order to maintain the ability to produce relevant ads to display, Google needs to be able to tell what information is important by finding out where it is, what it is and how it's displayed. The only way to do that is to start looking at our websites and seeing them like we do. Whatever the reason for this new crawler, It'll be interesting to see what they do with the new information they're collecting. As the web evolves, so must Google.
I don't do that much CSS (should do more), but if anyone has log entries showing GoogleBot spidering .css files (or .js ones) that are externally referenced from your html, that would be pretty darn solid proof they are doing this ...
Wasnt this already implemented somehow? Everyone know google already detects hidden text...which means it must look at stylesheets too? Anyway I think this is a good thing too...
Is this for AdSense use or for Google-use in general? If the latter, what's the point of this? If Google is going to start putting more emphasis on layouts and back-end code then a lot of the poorly-produced but great content websites will suffer because of this. If it IS strictly for AdSense use, it'll be interesting to see how this works, especially since sometimes CSS is relative and not absolute. What AS might think is the bottom of the page may really be the top, etc. Personally I think Googlebot should just keep its nose out of anything back-end and stick to content. People search for content, not XHTML 1.0 compliant websites. If any back-end snooping should occur it should be trying to figure out how to pull text content from Flash files.
It just makes sense that the CSS is a factor in the algorithm. If we can visually see it why shouldn't the bots have the technology to analyze it...
I like the idea that Google might be indexing javascript links. I am still running some applications such as Topics Anywhere on phpBB that I always wished Google would follow
google looking at an understanding css & js with bots is an inevitable evolution of the search engine. those of you who know css & js know how easy it is to hide text ... make what appears to be lowly "footer" text the biggest thing you see on a page though a browser. i think it would be necassary for google to "look with eyes" on the code to see the difference between what's most important in the html and what's most important on the screen.
I don't disagree this is a natural evolution and will eventually comes ... but does anyone have any web log data showing Googlebot spidering external .css or .js files? In order to see how the page renders, it HAS to suck those files in ...
Of course CSS can be used to hide things, but I think my point about some CSS being relative still stands. Even if the bot somehow assembles the page in it's entirety with HTML, CSS and JS files rendered, will it look RIGHT? Anyone who deals with a lot of web design knows that Internet Explorer has horrible CSS compliance, so if we want the most browser-compatibility, in essence we have to make 2 CSS files and use javascript to spit out the right one. If for whatever reason we have to start making a third CSS file to account for Google, either a lot of websites will forget it or a lot of coders will be frustrated. My concern is how exact, how elaborate Googlebot will be when handling CSS. I've seen some sites where the CSS classes/styles go 5, 6, 7 levels deep. Example: Googlebot may look at that 7th level's white text and the page's white background and accidentally think it's hidden text because it missed the 4th level's blue background.
i disagree that google needs to be "perfect" in it's understanding of your css to get the big picture ... but with all this google browser talk, how hard would it be for them? i'm simply stating a point that if google hasn't already been doing this, it's goign to in my opinion. assuming they want to see your css ... i can tell you they'll want to see it "as is" ... making a style sheet for an engine i think would be seen as a form of cloaking.
I just filtered my logs to show only Googlebot/2.1 for activity since November 1, 2005. Activity shows nearly 5000 hits including over 200 to robots.txt, but zero hits to the css. I don't have any js. The css file is not excluded in robots.txt. For my main site, Googlebot/2.1 does not read css.
After running additional filters, none of the major bots are reading the css. These bots showed up reading the css: hl_ftien_spider Archive.org crawler.de muncher Thumbnail.CZ robot 1.1 (http://thumbnail.cz/why-no-robots-txt.html)
Don't forget that its very easy to change your user agent. Google could be visiting all these css and js files using the user agent of MSIE6 or something and we would never know.
May be true but very less posibility. I checked my log right now from last 2 days and did not find googlebot accessing any .css or .js files. May be some more time before its a go.
What evidence do you have that Google reads both CSS and JS files? I've been running tests on this since last October/November, and NOT ONCE has a Google IP address (let alone one identifying itself as Mozilla Googlebot) requested external CSS files nor JS files. On top of that, I tested embedded JS using document.write to write links into the page to see if Google et al picked them up. Once, Google did pick up a full link (http://example.com/secret-test) but not when having something like document.write('http://example.com/' + 'test'); Code (markup): I've read elsewhere that Googlebot seems to rip out anything that looks like a link from the whole page (whether in JS or not), and my findings are consistent with this. I would love to hear what tests you've done!