I just got an email stating that one of my sites was being dropped because no co-op ad could be found on the following page: http://the-vegetable-site.com/vegetable-books/abe.php?an=joy larkcom&tn=creative vegetable gardening But my robots.txt file contains the lines: User-agent: * Disallow: /abe.php The cited URL is not a "page" of that site at all: it is just a forwarder script, linking to a page of the ABE site via Commission Junction. (And it is not in Google's archive.) What can I do to deal with this situation?
It's based on what Google knows about: http://www.google.com/search?hl=en&...le-books/abe.php+site:the-vegetable-site.com+
It's based on what Google knows about: http://www.google.com/search?hl=en&...table-site.com+ But, if I understand the purpose of robots.txt aright, how did Google come to know of those pages? (The file has been unchanged since that pass-through script went up.) Or, more on point, what should I do here? I am being penalized for not running ads on pages that are not pages of my site. Most of my sites have similar pass-through php scripts used on a fair percentage of pages: are all such sites impossible to keep in the co-op network?
Not sure exactly... but if Google doesn't adhere to your robots.txt file, you might want to shoot them an email.
Well, I did a lot of homework on robots.txt files, and discover that there is contradictory advice out there; so, it may be that my robots.txt file (there and on other sites) was defective. I have made what I hope are valid corrections, and asked G for a forced (immediate) robots.txt-based exclusion update. I wonder if this deserves a thread elsewhere, or if I'm the only fool in the world. More than one apparently authoritative source states that to block a particular file, one uses the form: Disallow: /filename.ext But others say to just use: Disallow: filename.ext While yet others say to use the form: Disallow: /pahtlevel1/pathlevel2/filename.ext I was using the first (slash-filename.ext), but have switched to the third (/fullpath/filename.ext) and will see what happens with that particular site over the next 24 hours.
I'm having the same problem. This is my robots.txt file User-agent: Titan Disallow: / User-agent: EmailCollector Disallow: / User-agent: EmailSiphon Disallow: / User-agent: EmailWolf Disallow: / User-agent: ExtractorPro Disallow: / User-agent: * Disallow: /cgi-bin/ Disallow: /search.asp Disallow: /search.php Disallow: /jump.php Disallow: /contact.php Code (markup): jump.php is my forwarder script, but is indexed by Google many times over and causes my Coop ads to be rejected once in awhile. It's located in the root so /jump.php would be the full path. Not sure what else to do for it.
Yup, User-agent: * Disallow: /abe.php Code (markup): will disallow every url that begins with http://the-vegetable-site.com/abe.php You need User-agent: * Disallow: /vegetable-books/abe.php Code (markup): If in doubt use the site that Google quotes (robotstxt.org) as your reference
Yes, interesting. I took this issue to a new thread elsewhere on "robots.txt". It seems to be an excellent example of hubris on my part, and possibly that of others, to have assumed that robots.txt has an "obvious" syntax, but, in mitigation of my folly, there is an *awful* lot of erroneous "information" out there on the web.