Is robots.txt observed?

Discussion in 'Co-op Advertising Network' started by Owlcroft, Dec 26, 2004.

Owlcroft Peon

Messages:

645

Likes Received:

34

Best Answers:

0

Trophy Points:

0

#1

I just got an email stating that one of my sites was being dropped because no co-op ad could be found on the following page:

http://the-vegetable-site.com/vegetable-books/abe.php?an=joy larkcom&tn=creative vegetable gardening

But my robots.txt file contains the lines:

User-agent: *
Disallow: /abe.php

The cited URL is not a "page" of that site at all: it is just a forwarder script, linking to a page of the ABE site via Commission Junction. (And it is not in Google's archive.)

What can I do to deal with this situation?

Owlcroft, Dec 26, 2004 IP
digitalpoint Overlord of no one Staff

Messages:

38,334

Likes Received:

2,613

Best Answers:

462

Trophy Points:

710

Digital Goods:

29

#2

It's based on what Google knows about:
http://www.google.com/search?hl=en&...le-books/abe.php+site:the-vegetable-site.com+

If you contact me privately for support, I'll direct you to the correct support forum. Save time and go there first.
Ingress Intel

digitalpoint, Dec 26, 2004 IP
Owlcroft Peon

Messages:

645

Likes Received:

34

Best Answers:

0

Trophy Points:

0

#3

Original:

I just got an email stating that one of my sites was being dropped because no co-op ad could be found on the following page:

http://the-vegetable-site.com/veget...ble gardening

But my robots.txt file contains the lines:

User-agent: *
Disallow: /abe.php

The cited URL is not a "page" of that site at all: it is just a forwarder script, linking to a page of the ABE site via Commission Junction. (And it is not in Google's archive.)

What can I do to deal with this situation?
Click to expand...

It's based on what Google knows about:
http://www.google.com/search?hl=en&...table-site.com+

But, if I understand the purpose of robots.txt aright, how did Google come to know of those pages? (The file has been unchanged since that pass-through script went up.)

Or, more on point, what should I do here? I am being penalized for not running ads on pages that are not pages of my site. Most of my sites have similar pass-through php scripts used on a fair percentage of pages: are all such sites impossible to keep in the co-op network?

Owlcroft, Dec 26, 2004 IP
digitalpoint Overlord of no one Staff

Messages:

38,334

Likes Received:

2,613

Best Answers:

462

Trophy Points:

710

Digital Goods:

29

#4

Not sure exactly... but if Google doesn't adhere to your robots.txt file, you might want to shoot them an email.

If you contact me privately for support, I'll direct you to the correct support forum. Save time and go there first.
Ingress Intel

digitalpoint, Dec 26, 2004 IP
Owlcroft Peon

Messages:

645

Likes Received:

34

Best Answers:

0

Trophy Points:

0

#5

Well, I did a lot of homework on robots.txt files, and discover that there is contradictory advice out there; so, it may be that my robots.txt file (there and on other sites) was defective. I have made what I hope are valid corrections, and asked G for a forced (immediate) robots.txt-based exclusion update.

I wonder if this deserves a thread elsewhere, or if I'm the only fool in the world. More than one apparently authoritative source states that to block a particular file, one uses the form:

Disallow: /filename.ext

But others say to just use:

Disallow: filename.ext

While yet others say to use the form:

Disallow: /pahtlevel1/pathlevel2/filename.ext

I was using the first (slash-filename.ext), but have switched to the third (/fullpath/filename.ext) and will see what happens with that particular site over the next 24 hours.

Owlcroft, Dec 26, 2004 IP
Cardplayer Peon

Messages:

53

Likes Received:

4

Best Answers:

0

Trophy Points:

0
#6
I'm having the same problem.

This is my robots.txt file

User-agent: Titan Disallow: / User-agent: EmailCollector Disallow: / User-agent: EmailSiphon Disallow: / User-agent: EmailWolf Disallow: / User-agent: ExtractorPro Disallow: / User-agent: * Disallow: /cgi-bin/ Disallow: /search.asp Disallow: /search.php Disallow: /jump.php Disallow: /contact.php

Code (markup):

jump.php is my forwarder script, but is indexed by Google many times over and causes my Coop ads to be rejected once in awhile. It's located in the root so /jump.php would be the full path. Not sure what else to do for it.
Cardplayer, Dec 27, 2004 IP
exam Peon

Messages:

2,434

Likes Received:

120

Best Answers:

0

Trophy Points:

0
#7
Yup,

User-agent: * Disallow: /abe.php

Code (markup):

will disallow every url that begins with
http://the-vegetable-site.com/abe.php

You need

User-agent: * Disallow: /vegetable-books/abe.php

Code (markup):

If in doubt use the site that Google quotes (robotstxt.org) as your reference
exam, Dec 27, 2004 IP
Owlcroft Peon

Messages:

645

Likes Received:

34

Best Answers:

0

Trophy Points:

0

#8

Yes, interesting. I took this issue to a new thread elsewhere on "robots.txt". It seems to be an excellent example of hubris on my part, and possibly that of others, to have assumed that robots.txt has an "obvious" syntax, but, in mitigation of my folly, there is an *awful* lot of erroneous "information" out there on the web.

Owlcroft, Dec 28, 2004 IP