Googlebot downloaded all kinds of files from my site including the installers for my apps (until I blocked them because they are over 10MB). They are in .msi (Windows Installer) format. This format contains various metadata as description, keywords, and links to support and manufacturer site. But GoogleBot does not undestand it. The file shows up in SERPs with a notice stating it is an unknown format. Now, would it be against the rules to send a html page to googlebot with the meta information from the installer instead of the installer itself? This time in a recognizable (html) format of course. I wish Googlebot understood .msi by itself... Regards, Vlasta
Why don't you just Disallow that folder from being spidered if you are concern with the bandwith it takes ?
That is very odd that googlebot is downloading msi files. Can you show an example I am curious to see. I would block google from accessing that myself, it wouldnt hurt you in anyway as you just dont want google to access the file.
I have already blocked them, but they were in SERPs. My question is about letting Google use the meta information from them: description, keywords, link to manufacturer, support, etc.
His concern is not removing them from the database. as I understand he already blocked them and they are no longer in the serps. You want to pass the position of the installer to an html page, right ? well the meta information is far from enough to do this. Otherwise I don't see any cloaking, you remove the msi from the database and basically you are building a new page with new content
This is closer to my original question , but still... When googlebot requests the url of the installer, a html page with the meta information from the installer will be sent instead of the original installer. This way, the SERPs listing for that url would look like "Installer for product XXX x.xxMB" followed by the description instead of "xxxx.msi <unrecognized format>". Would that be cloaking?
If i follow the logic of your question and in one word - YES! Cloaking is showing the search engines one page and a real visitor something else.