Yesterday they awarded the MEDAL OF FREEDOM http://www.whitehouse.gov/news/releases/2004/06/20040623-8.html to civilians - while looking at their press release - just decided to see how they construct their META TAGS and Robots.txt files http://www.whitehouse.gov/robots.txt Boy - sometimes - you have to wonder??? here is just a small sample:
What a waste of time... restricting a directory, then also disallowing every sub-directory within it. heh Although, maybe Bush is learning to be a webmaster.
LOL you'll have to let us know when he starts using the keyword tracker, Shawn - methinks he'll be watching how "miserable failure" does
wow, the sign up for these forums didn't take long at all. thank god, all the other ones take a whole day. anyway, I wanted to reply to this thread, as I have been actively seeking information on the robots.txt and am wating for the syntax to start using an "allow" feature as well, or something so that I don't have to have a crawler page and a robots page. In doing this, I read a really funny article http://www.theinquirer.net/?article=19357 and I know, the source is the inquirer, but, it was so distorted in it's thinking. Like where it says that the r.t's keep search engines from taking a snapshot of our history. Dumb statement there. Search engines are not designed to take snapshots...lol.. they are designed to let you search for the information that you are looking for, duh, hello. The fact of the matter is that, what is going on here is that they are using some type of include files on their system, so they have a simple mostly text form of what they type, and then they do another page that calls the include and the tex field. It's a larg site, so they only want to have to change a little bit of code at a time. This is not in defense of the whitehouse...lol.. lord knows I wouldn't go that far, but, I doubt someone said "hey web programmer dude, let's be sneaky and not let search engines get certain information"... Look, if they were trying to be sneaky with their information, they simply wouldn't post it on the internet at all. It would be hidden somewhere for real. And as for that article saying that that is not the normal ammount of dissalows, again, it's not a normal sized site, so of course it has more directories. try going to www.google.com/robots.txt that one has a lot as well, but guess what? it's syntax is wrong. All the directories need to have trailing "/" OK, let me try this again, it wouldn't let me put in "live links" cause I am new... hello? administrator, I really am not trying to promote my own site, I promise...LOL.. Anyway, thanks for lettin' me rant Mike
If you are doing something to try to serve up different pages to spiders/bots than to human visitors, be advised that such practices may result in penalties by search engines. Anything on your site that you don't want a spider to "view" can be excluded with the Disallow: statement in robots.txt.