Recently read the google patent application. It mentions that google rewards fresh content. How does google determine fresh content. Is it from file modification? Or by comparing it with last content. So if a page has SSI scripts to generate content within a .shtml page, and the .shtml file as such doesnot change but the script generate newer content, will it be treated as fresh content. In this case the file modification date wont change at all. But the content could be different at every load.
How on earth would google know whether you've updated the file or not. Many many sites use the same file names over and over again, for example blogs pulling info using just an index.php file. The file name never changes but the content does. All the google bot see's is the html that's sent to it. If it's just sent to domainname.com/ then it never knows what file it gets. So yes, even if the file name never changes googlebot will see new content and will treat it as exactly that.
Thats why alot of folks uses newsfeeds on their sites. It is constantly fresh. But dont use javascript newsfeeds. Tim
OK, Homer, I'm game. How do you find out the last modified date of a file when it is served by a webserver - I'd have thought that you'd need to have access to the file system for that. You can easily see if the content served by said webserver has changed, but I'm damned if I can see a way to find out if the file was edited or not.
From a technical standpoint I am not sure how they do it but ALL se's know when files are modified. If you recall about a year ago Altavista used to show the user in serps last refreshed date...'refreshed in the last 24 hrs' On the SEO side in has been common knowledge to modify your index page every day to prompt more frequent visits from search bots. In fact many webmasters still put alot of weight on this. I am not a coder so I can't tell you how they get this info, jlawrence. But they do
i highly doubt that they know fresh content by last modified date. if this were the case, evryone would be resaving their files on a daily basis just for the sake of modifying the date modified. i would think that the way they know whether a piece of content is fresh or not is by comparing it to what's in their index. if not found in the index, the content is new
I agree with you jlawrence, how the heck can Google see the modified date on the webserver? Sure, you and I can see it, but not Google. Their spiders are just comparing the content of the page from the "picture" they had taken when the were first there and if the next time the spiders come the content is different.... the spiders see "new, fresh contrent".
date modified is available as one of http headers within a response to HTTP GET request. so google can see it
Devbistro: well I'll be damned, you're correct. Strange how I can never get anything from that header though. I'm always seeing last_modified as 0. I'll have to have a play with that some more - as stated in another post, I've had far too much to drink to play with code now (this took 2 edits to get it out).
Personally I think Google probably don't relay on that, they are more likely to perform a hash function on the page, and compare the result to the previous one (different hash result == fresh content). Or, just compare the file size.
So you reckon forums are Google's pets? If G likes freshness nothing can offer more freshness than forums.
Forums and blogs are seen differently by Google. When I refer to refreshing your content daily I am basing it on a html/ php site. Pages crafted by a humans still are the most valuable to Google.