Consolidation Loan - Home Loan - Flights - Debt Consolidation - Vacation Spots

PDA

View Full Version : Have robot.txt. but always 404


j3r0m3
Mar 23rd 2006, 7:17 am
Could you please suggest to me how can i get rid of a 404 error whenever a spider bot comes to this site? As far as i know, i already have a robot.txt, however i cannot seem to get any headway. The robot.txt section in the forum says much about creation, but nothing much on whether i should chage its permission or what?

i constantly getting errors similar to this:

Date: 03-21-2006[00:19:56]
Robot request for: http://www.linguagymnastics.com/robots.txt was not found!
IP address: 72.30.97.225
Browser: Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)
Referred by: n/a

Error Code: 404

Thanks to all who read up to this point, really appreciate your know-how and comments

ryan_uk
Mar 23rd 2006, 7:22 am
you have an auto-redirect to http://www.linguagymnastics.com/ and then http://www.linguagymnastics.com/blog/

I guess this is your problem. check your .htaccess

j3r0m3
Mar 23rd 2006, 7:30 am
so after clearing the .htaccess, my problem should be gone?

Sem-Advance
Mar 23rd 2006, 7:38 am
Hi

No the problem won't be gone.

I wrote about the need for websites to use robots.txt in order to hopefully rank well in the search engines.

http://www.seochat.com/c/a/Search-Engine-Optimization-Help/Write-a-Robotstxt-File/

The 404 is the robots hitting your site asking for your robots.txt file since they don't find one they probably end up leaving.

Install one and all pages are usually indexed quickly and fully.

Hope this helps

ryan_uk
Mar 23rd 2006, 7:44 am
He has one, but due to a redirect it's not being found. You might be able to place it under yourdomain.com/blog or otherwise a rule can be written to redirect all except request for robots.txt. post your .htaccess and maybe myself or someone else can help.

mcfox
Mar 23rd 2006, 7:46 am
It's easy to solve. Rename your file to robots.txt. Currently it is called robot.txt -- missing the letter 's' -- should be plural, not singular.

j3r0m3
Mar 23rd 2006, 8:43 am
He has one, but due to a redirect it's not being found. You might be able to place it under yourdomain.com/blog or otherwise a rule can be written to redirect all except request for robots.txt. post your .htaccess and maybe myself or someone else can help.

.htacess is blank now at root level.

.htacess at /blog level only contains code for my Wordpress permalinks.

Sem-Advance
Mar 23rd 2006, 8:52 am
Ahhh McFox is right

http://www.linguagymnastics.com/robot.txt


Your file is misspelled add an s after robot

j3r0m3
Mar 23rd 2006, 6:47 pm
thank you all

minstrel
Apr 1st 2006, 11:09 pm
I wrote about the need for websites to use robots.txt in order to hopefully rank well in the search engines.

http://www.seochat.com/c/a/Search-En...obotstxt-File/

The 404 is the robots hitting your site asking for your robots.txt file since they don't find one they probably end up leaving.
No they won't. That's simply not correct.

(Didn't anyone mention that at seochat?)

Sem-Advance
Apr 2nd 2006, 6:58 am
Check logs Minstrel.....I see robots requesting the file...if its not there it is a 404 error...some bots will leave.

Also I think you missed a word in my quote.

The 404 is the robots hitting your site asking for your robots.txt file since they don't find one they probably end up leaving.

ryan_uk
Apr 2nd 2006, 7:23 am
minstrel is right...

robots.txt is not required and won't help ranking. It's just some guideliness about what not to check (and what to check) for crawlers. Some respect it, others don't. They won't leave if one doesn't exist. They might leave depending on what's in robots.txt and/or robots meta tags.

Sem-Advance
Apr 2nd 2006, 8:26 am
Now if only you were every robot you would be a good source to dispute such... but as your not ...I believe your thinking is somewhat flawed.

Research the subject some.

You will be surprised what you learn when you look past what you think you know.

Also I would recommend research to be done on data mining scripts and not SEO related issues.

Thanks for the input....

Sem-Advance
Apr 2nd 2006, 8:32 am
Since I have a hard time getting my points across for whatever reason..

I direct you to the Google engineer who it seems everyone will beluieve

February 7, 2006 @ 11:09 pm · Filed under Google/SEO (http://www.mattcutts.com/blog/type/googleseo/)
The Sitemaps team just introduced a new robots.txt tool (http://sitemaps.blogspot.com/2006/02/more-stats-and-analysis-of-robotstxt.html) into Sitemaps. The robots.txt file (http://www.robotstxt.org/) is one of the easiest things for a webmaster to make a mistake on. Brett Tabke’s Search Engine World has a great robots.txt tutorial (http://www.searchengineworld.com/robots/robots_tutorial.htm) and even a robots.txt validator (http://www.searchengineworld.com/cgi-bin/robotcheck.cgi).
Despite good info on the web, even experts can have a hard time knowing with 100% confidence what a certain robots.txt will do.


When Danny Sullivan recently asked a question about prefixing matching, I had to go ask the crawl team to be completely sure.



Part of the problem is that mucking around with robots.txt files is pretty rare; once you get it right once, you usually never have to think about the file again. Another issue is that if you get the file wrong, it can have a large impact on your site, so most people don’t mess with their robots.txt file very often.



Finally, each search engine has slightly different extra options that they support. For example, Google permits wildcards (*) and the “Allow:” directive.


You can finish reading the rest at this link as I don't need to repost the whole thing here

http://www.mattcutts.com/blog/new-robotstxt-tool/

If the directives do not match what the spider was programmed for... then it will most certainly leave.

Spider bots are very intense and when they hammer on your server they can make it crash...do not underestimate their abilities.

minstrel
Apr 2nd 2006, 8:43 am
You have totally misunderstood what the comments you have cited are saying, Sem-Advance.

Nowhere in there does it say that bot will leave, or even probably leave, if you don't have a robots.txt file. What they are saying is that if you mess up your robots.txt file, you may create a problem for spiders on your site.

In other words, a bad robots.txt file is a problem; NO robots.txt file is not - unless you have files or directories you do not want indexed.

Let me help by rewording your comment:

The 404 is the robots hitting your site asking for your robots.txt file. Since they don't find one, they will not end up leaving but rather will go on to spider your site without any restrictions as to what they should or should not crawl.

Sem-Advance
Apr 2nd 2006, 8:54 am
Dear Minstel

Why would you reword my comment??:mad:

I stick by what I post. I type perfectly fine as you can see since this post is following yours.

How would you like me to reword comments you make you feel are correct and then post them around the internet??

I doubt you would so show me the same courtesy!

Next I cited one source not all that I have read. You have cited none.

Do me a favor and look in your log file..tell me how many robots crawl your site?? Any idea why more do not ???

Now for those of you who have websites listed on only one or two of the three majors and do not have a robots.txt file...install one and your site will soon show on all three...(barring any spam or coding issues of your pages).

EGS
Apr 2nd 2006, 9:01 am
Your file is saved as robot.txt ... you need to rename to robots.txt
See if that is the problem! :D

minstrel
Apr 2nd 2006, 9:09 am
Sem-Advance, you are completely and utterly wrong about this issue. I suggest you give it up.

ryan_uk
Apr 2nd 2006, 9:14 am
Sem-Advance, I suggest you start checking some of the major sites indexed by google, msn, yahoo or any other SE and look for a robots.txt. Many don't have one. robots.txt is just a suggestion, not a standard. For example, www.cnn.com is in all those search engines and more ... but lo-and-behold no robots.txt. It's by no means essential at all, it can be helpful, especially to ensure folders and pages that you don't want indexing aren't. And if it's written incorrectly it might stop robots indexing pages that you do want indexing. However, a lack of robots.txt doesn't matter whatsoever.

Sem-Advance
Apr 2nd 2006, 9:16 am
More study places

http://www.robotstxt.org/wc/faq.html

http://www.robotstxt.org/wc/eval.html

http://www.robotstxt.org/wc/threat-or-treat.html

http://www.w3.org/robots.txt :D

Sem-Advance
Apr 2nd 2006, 9:17 am
Ok Ryan

Same home work

check logs how many spider your site??

Why don't more??

ryan_uk
Apr 2nd 2006, 9:18 am
Try studying them, then Sem-Advance.

A quote from robotstxt.org:

It is not an official standard backed by a standards body, or owned by any commercial organisation. It is not enforced by anybody, and there no guarantee that all current and future robots will use it. Consider it a common facility the majority of robot authors offer the WWW community to protect WWW server against unwanted accesses by their robots.

minstrel
Apr 2nd 2006, 9:19 am
Sem-Advance, what is the point of posting all these links to references on how to construct a good robots.txt file? None of them back up your claim that if you don't have one spiders will leave.

j3r0m3
Apr 2nd 2006, 8:58 pm
erm, gents
we seem to have 2 schools of thought over here.
why not we leave it at that and not try to convince one another of the benefits , because it will continue til the cows come home.

minstrel
Apr 2nd 2006, 10:19 pm
Because this isn't a matter of opinion, j3r0m3. This is a matter of Sem-Advance just plain being wrong.

mcfox
Apr 2nd 2006, 11:09 pm
Have to agree. Having or not having a robots.txt file does not make a difference as to whether your site gets indexed or not and saying the robot will leave the site if you do not have one is 100% wrong.

If you stipulate parts of your site you do not want robots to index then that's about as good as you can hope for and only some of them will obey.

shauner
Apr 3rd 2006, 12:04 am
I get several hits to robots.txt daily, which doesn't exist on any of my sites. So it shows a 404 hit on my stats page, oh well.

But I still have Google, MSN and Yahoo crawl HUNDREDS of pages on each site daily. I don't see any point in spending the time creating a robots.txt file when I already get crawled thoroughly.

ryan_uk
Apr 3rd 2006, 1:17 am
I don't see any point in spending the time creating a robots.txt file when I already get crawled thoroughly.
The point is, you don't need one to get crawled. It's only if you don't want some pages and/or directories indexed. Either in general or by a particular SE.

For example, some people exclude Google's Image Bot as it's often unbeneficial (people look at the images and not the pages) and a waste of bandwidth.

On the other hand, sitemaps might help in getting indexed by google, assuming the sitemap is submitted to google sitemaps (https://www.google.com/webmasters/sitemaps/login?hl=en) and is compatible with google. Maybe this is what Sem-Advance is confused about. However, a sitemap is by no means essential.