Googlebot 2.0 - Now It Looks at CSS and Javascript

Discussion in 'Google' started by chulium, Feb 23, 2006.

  1. #1
    Source: http://www.adsensebits.com/node/24

    Up until now the bot (or "crawler") that Google used to index sites was based on a lynx browser. Which is to say it "saw" text. Previously the only way to tell if information was important, was if we told Google it was by using various forms of markup and a particularly labeled comment that Google would recognize as outlining important information. To get a better idea of how Google used to see your site, load up FireFox and disable images, scripts and style sheets. Start browsing.

    Enter: Googlebot 2.0
    But that was so 2005. 2006 sees the introduction of a newer, smarter and pickier Googlebot. One that cares about CSS and JavaScript, position and layout. Utilizing what shows up in logs as
    Mozilla, Googlebot
    Googlebot 2.0 is looking websites up and down in ways it couldn't have possibly done before by requesting style sheets (.css) and JavaScript's (.js).

    I suppose the most obvious reason for this is that Mozilla Googlebot is taking snapshots of our websites in order to integrate the ever-so-popular but sometimes irritating thumbnail feature so many smaller search engines have implemented, or they're doing something much more Google; Searching our websites like a human being.

    It seems to me that in order to maintain the ability to produce relevant ads to display, Google needs to be able to tell what information is important by finding out where it is, what it is and how it's displayed. The only way to do that is to start looking at our websites and seeing them like we do.

    Whatever the reason for this new crawler, It'll be interesting to see what they do with the new information they're collecting. As the web evolves, so must Google.
     
    chulium, Feb 23, 2006 IP
    nOR and BamaStangGuy like this.
  2. hulkster

    hulkster Peon

    Messages:
    1,705
    Likes Received:
    93
    Best Answers:
    0
    Trophy Points:
    0
    #2
    I don't do that much CSS (should do more), but if anyone has log entries showing GoogleBot spidering .css files (or .js ones) that are externally referenced from your html, that would be pretty darn solid proof they are doing this ...
     
    hulkster, Feb 23, 2006 IP
  3. BamaStangGuy

    BamaStangGuy Notable Member

    Messages:
    955
    Likes Received:
    51
    Best Answers:
    1
    Trophy Points:
    245
    #3
    Very cool, I hope this is the case
     
    BamaStangGuy, Feb 23, 2006 IP
  4. hulo

    hulo Peon

    Messages:
    169
    Likes Received:
    5
    Best Answers:
    0
    Trophy Points:
    0
    #4
    Interesting; I will have to search around for the real SEO implications.
     
    hulo, Feb 23, 2006 IP
  5. dsm56

    dsm56 Active Member

    Messages:
    863
    Likes Received:
    27
    Best Answers:
    0
    Trophy Points:
    78
    #5
    Wasnt this already implemented somehow?
    Everyone know google already detects hidden text...which means it must look at stylesheets too?

    Anyway I think this is a good thing too...
     
    dsm56, Feb 23, 2006 IP
  6. LaCabra

    LaCabra Goats R Us

    Messages:
    1,954
    Likes Received:
    241
    Best Answers:
    0
    Trophy Points:
    0
    #6
    sayles runs out and changes all his websites to CSS ! Good info CompuXP thanks!
     
    LaCabra, Feb 23, 2006 IP
  7. sketch

    sketch Well-Known Member

    Messages:
    898
    Likes Received:
    26
    Best Answers:
    0
    Trophy Points:
    148
    #7
    Is this for AdSense use or for Google-use in general? If the latter, what's the point of this? If Google is going to start putting more emphasis on layouts and back-end code then a lot of the poorly-produced but great content websites will suffer because of this.

    If it IS strictly for AdSense use, it'll be interesting to see how this works, especially since sometimes CSS is relative and not absolute. What AS might think is the bottom of the page may really be the top, etc.

    Personally I think Googlebot should just keep its nose out of anything back-end and stick to content. People search for content, not XHTML 1.0 compliant websites. If any back-end snooping should occur it should be trying to figure out how to pull text content from Flash files.
     
    sketch, Feb 23, 2006 IP
  8. Olney

    Olney Berserker

    Messages:
    234
    Likes Received:
    6
    Best Answers:
    0
    Trophy Points:
    0
    #8
    It just makes sense that the CSS is a factor in the algorithm. If we can visually see it why shouldn't the bots have the technology to analyze it...
     
    Olney, Feb 23, 2006 IP
  9. dvduval

    dvduval Notable Member

    Messages:
    3,372
    Likes Received:
    356
    Best Answers:
    1
    Trophy Points:
    260
    #9
    I like the idea that Google might be indexing javascript links. I am still running some applications such as Topics Anywhere on phpBB that I always wished Google would follow
     
    dvduval, Feb 23, 2006 IP
  10. skattabrain

    skattabrain Peon

    Messages:
    628
    Likes Received:
    18
    Best Answers:
    0
    Trophy Points:
    0
    #10
    google looking at an understanding css & js with bots is an inevitable evolution of the search engine.

    those of you who know css & js know how easy it is to hide text ... make what appears to be lowly "footer" text the biggest thing you see on a page though a browser.

    i think it would be necassary for google to "look with eyes" on the code to see the difference between what's most important in the html and what's most important on the screen.
     
    skattabrain, Feb 23, 2006 IP
  11. hulkster

    hulkster Peon

    Messages:
    1,705
    Likes Received:
    93
    Best Answers:
    0
    Trophy Points:
    0
    #11
    I don't disagree this is a natural evolution and will eventually comes ... but does anyone have any web log data showing Googlebot spidering external .css or .js files?

    In order to see how the page renders, it HAS to suck those files in ...
     
    hulkster, Feb 23, 2006 IP
  12. sketch

    sketch Well-Known Member

    Messages:
    898
    Likes Received:
    26
    Best Answers:
    0
    Trophy Points:
    148
    #12
    Of course CSS can be used to hide things, but I think my point about some CSS being relative still stands. Even if the bot somehow assembles the page in it's entirety with HTML, CSS and JS files rendered, will it look RIGHT?

    Anyone who deals with a lot of web design knows that Internet Explorer has horrible CSS compliance, so if we want the most browser-compatibility, in essence we have to make 2 CSS files and use javascript to spit out the right one. If for whatever reason we have to start making a third CSS file to account for Google, either a lot of websites will forget it or a lot of coders will be frustrated.

    My concern is how exact, how elaborate Googlebot will be when handling CSS. I've seen some sites where the CSS classes/styles go 5, 6, 7 levels deep. Example: Googlebot may look at that 7th level's white text and the page's white background and accidentally think it's hidden text because it missed the 4th level's blue background.
     
    sketch, Feb 23, 2006 IP
  13. mightyb

    mightyb Banned

    Messages:
    6,566
    Likes Received:
    405
    Best Answers:
    0
    Trophy Points:
    0
    #13
    Next step for them is to start reading flash as well.
     
    mightyb, Feb 23, 2006 IP
  14. skattabrain

    skattabrain Peon

    Messages:
    628
    Likes Received:
    18
    Best Answers:
    0
    Trophy Points:
    0
    #14
    i disagree that google needs to be "perfect" in it's understanding of your css to get the big picture ... but with all this google browser talk, how hard would it be for them?

    i'm simply stating a point that if google hasn't already been doing this, it's goign to in my opinion.

    assuming they want to see your css ... i can tell you they'll want to see it "as is" ... making a style sheet for an engine i think would be seen as a form of cloaking.
     
    skattabrain, Feb 23, 2006 IP
  15. Netizen

    Netizen Peon

    Messages:
    148
    Likes Received:
    6
    Best Answers:
    0
    Trophy Points:
    0
    #15
    I just filtered my logs to show only Googlebot/2.1 for activity since November 1, 2005. Activity shows nearly 5000 hits including over 200 to robots.txt, but zero hits to the css. I don't have any js. The css file is not excluded in robots.txt.

    For my main site, Googlebot/2.1 does not read css.
     
    Netizen, Feb 23, 2006 IP
  16. Netizen

    Netizen Peon

    Messages:
    148
    Likes Received:
    6
    Best Answers:
    0
    Trophy Points:
    0
    #16
    After running additional filters, none of the major bots are reading the css. These bots showed up reading the css:

    hl_ftien_spider
    Archive.org
    crawler.de
    muncher
    Thumbnail.CZ robot 1.1 (http://thumbnail.cz/why-no-robots-txt.html)
     
    Netizen, Feb 23, 2006 IP
  17. kika

    kika Peon

    Messages:
    13
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #17
    google reads flash already !
     
    kika, Feb 23, 2006 IP
  18. mad4

    mad4 Peon

    Messages:
    6,986
    Likes Received:
    493
    Best Answers:
    0
    Trophy Points:
    0
    #18
    Don't forget that its very easy to change your user agent.

    Google could be visiting all these css and js files using the user agent of MSIE6 or something and we would never know.
     
    mad4, Feb 24, 2006 IP
  19. jimkarter

    jimkarter Notable Member

    Messages:
    5,168
    Likes Received:
    347
    Best Answers:
    0
    Trophy Points:
    235
    #19
    May be true but very less posibility.

    I checked my log right now from last 2 days and did not find googlebot accessing any .css or .js files. May be some more time before its a go.
     
    jimkarter, Feb 24, 2006 IP
  20. eKstreme

    eKstreme Guest

    Messages:
    131
    Likes Received:
    14
    Best Answers:
    0
    Trophy Points:
    0
    #20
    What evidence do you have that Google reads both CSS and JS files? I've been running tests on this since last October/November, and NOT ONCE has a Google IP address (let alone one identifying itself as Mozilla Googlebot) requested external CSS files nor JS files.

    On top of that, I tested embedded JS using document.write to write links into the page to see if Google et al picked them up. Once, Google did pick up a full link (http://example.com/secret-test) but not when having something like

    
    document.write('http://example.com/' + 'test');
    
    Code (markup):
    I've read elsewhere that Googlebot seems to rip out anything that looks like a link from the whole page (whether in JS or not), and my findings are consistent with this.

    I would love to hear what tests you've done!
     
    eKstreme, Feb 24, 2006 IP