View Full Version : Google Sitemap Changes
aeiouy
Aug 29th 2005, 8:01 pm
Google changed up their sitemap set up a bit. For one they have a verification option where you but a blank file at the same spot as your site map and you can get statistics and feedback on any problems they have with their site map.
Also gives you information about pages it found outside your site map and it had issues with as well.
toocoolforschool
Aug 29th 2005, 8:36 pm
Interesting, just noticed that after you mentioned it. What does that "Verify" link do, again?
Interlogic
Aug 30th 2005, 4:16 am
Once you place the right file on your server it will let you see a list of any errors google encounted while spidering your site (even pages that aren't in your sitemap)
I have to say I think it's a great touch
boohlick
Aug 30th 2005, 5:03 am
I think i give it a try :)
Johnburk
Aug 30th 2005, 12:54 pm
I noticed it too and gave it a try.
It gives a HTTP Error with the following file.
/%5Cindex.html
But wat does %5C mean?
Jan
Aug 30th 2005, 1:19 pm
%5C is a backslash - \ - in ASCII hex 5C - decimal 92.
Johnburk
Aug 30th 2005, 2:34 pm
Thank you, How can I solve this? Because now it is /\index.html but I cannot see anywhere where this is located.
webmistress
Aug 30th 2005, 3:22 pm
Hey thanlks for the tip. Nice touch :)
jazzylee77
Aug 30th 2005, 3:31 pm
I gave this a try and got the message:
We've detected that your 404 (file not found) error page returns a status of 200 (OK) in the header. this configuration presents a security risk for site verification and therefore, we can't verify your site. If your web server is configured to return a status of 200 in the header of 404 pages, and we enabled you to verify your site with this configuration, others would be able to take advantage of this and verify your site as well. This would allow others to see your site statistics. To ensure that no one can take advantage of this configuration to view statistics to sites they don't own, we only verify sites that return a status of 404 in the header of 404 pages
I have custom 404 pages on most my sites. an example is...
http://matchtales.com/html/welcome_to_match_tales.html
I checked with the hosts chat help but don't neccessarily trust their answer that I can change something in the html of the page to show a 404 status. Anyone care to enlighten me?
aeiouy
Aug 30th 2005, 3:41 pm
I gave this a try and got the message:
I have custom 404 pages on most my sites. an example is...
http://matchtales.com/html/welcome_to_match_tales.html
I checked with the hosts chat help but don't neccessarily trust their answer that I can change something in the html of the page to show a 404 status. Anyone care to enlighten me?
Be good info to know..and I don't know. I don't see anything in the page source on legitimate 404 pages, so not sure. I will see if I can find out and drop a note, because I am curious too.
Edit: I found this link, http://www.thesitewizard.com/archive/custom404.shtml but it does not seem to specifically mention anything that would make it a 404 versus anything else.
Maybe it is the htaccess set up that does it.. But I am really clueless.
jazzylee77
Aug 30th 2005, 3:55 pm
the more the merrier! I suspect those status codes are generated by the server and my friendly chat help dished me off. I'll try the next rung up the host suppport ladder a trouble ticket.
swd
Aug 30th 2005, 4:04 pm
I gave this a try and got the message:
I have custom 404 pages on most my sites. an example is...
http://matchtales.com/html/welcome_to_match_tales.html
I checked with the hosts chat help but don't neccessarily trust their answer that I can change something in the html of the page to show a 404 status. Anyone care to enlighten me?
I also got this error for one of my sites.
If someone knows how to fix it .. it would be great.
All the best,
SWD
webmistress
Aug 30th 2005, 4:16 pm
Only your host can help you on this one :)
ikeys
Aug 30th 2005, 5:50 pm
If you do the check google sends two probes to your server .. these are from my logs
crawl-66-249-65-173.googlebot.com - - [30/Aug/2005:17:32:05 +0200] "HEAD /GOOGLE**********.html HTTP/1.1" 200 0 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
crawl-66-249-65-173.googlebot.com - - [30/Aug/2005:17:32:05 +0200] "HEAD /GOOGLE404probe*******.html HTTP/1.1" 404 0 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
For the first your server needs to return a 200 code ... a normal http page OK code
The second your server needs to return a 404 PAGE NOT FOUND code
it says ... /GOOGLE404probe***randomstuff***.html
If you have custom 404 pages the second probe can go wrong and you will get that error if those custom 404s dont give a 404 http code
these codes are in http headers, something not visible in a browser
there are tools to check them like
http:// www. seoconsultants .com /tools/headers.asp
ikeys
Aug 30th 2005, 5:57 pm
Just did the check with your site:
SEO Consultants Directory Check Server Headers - Single URI Results
Current Date and Time: 2005-08-30T17:53:58-0800
#1 Server Response: http://www. matchtales .com/html/some_grabage_here_to_have_a_404
HTTP Status Code: HTTP/1.1 200 OK
Date: Wed, 31 Aug 2005 00:53:55 GMT
Server: Apache/1.3.31 (Unix) PHP/4.3.11 mod_ssl/2.8.18 OpenSSL/0.9.6b FrontPage/5.0.2.2635 mod_throttle/3.1.2
X-Powered-By: PHP/4.3.11
Connection: close
Content-Type: text/html
HTTP Status Code: HTTP/1.1 200 OK
Should have been
HTTP Status Code: HTTP/1.1 404 Page not found
at least for all requests starting with /GOOGLE404probe
ikeys
Aug 30th 2005, 6:08 pm
just checked it ..
In PHP it is done like this
just add this in your passthru.php
if(eregi('GOOGLE404probe',$_SERVER[REQUEST_URI])){
header('HTTP/1.1 404 File not found');
exit;
}
markhutch
Aug 30th 2005, 10:23 pm
I noticed this new feature today, too. The error they gave me was a page not found when trying to access robots.txt file. I didn't have one up at that time, but maybe they want to to have one when using sitemaps.
battra
Aug 31st 2005, 1:57 am
I have the same problem with one of my site. I think mine is because Mambo handles all page requests. Not sure how to fix it :(
jazzylee77
Aug 31st 2005, 12:39 pm
just checked it ..
In PHP it is done like this
just add this in your passthru.php
if(eregi('GOOGLE404probe',$_SERVER[REQUEST_URI])){
header('HTTP/1.1 404 File not found');
exit;
}
Thanks, I'll give that a shot tonight if the host hasn't acted yet.
jazzylee77
Aug 31st 2005, 5:50 pm
Well I added the code the existing code like this
<?php
if (!function_exists('file_get_contents')) {
function file_get_contents($url) {
$handle = fopen($url, 'r');
$string = fread($handle, 4096000);
fclose($handle);
return $string;
}
}
include ('ad_network_222.php');
echo preg_replace ("/<\/body>/i", '<br><div class="main" style="padding-left:12px; padding-right:12px">'.
$ad_network . '</body>', file_get_contents(str_replace ('../', '', $_REQUEST['file'])));
if(eregi('GOOGLE404probe',$_SERVER[REQUEST_URI])){
header('HTTP/1.1 404 File not found');
exit;
}
?>
still get the error on verify from google. Host doesn't seem to understand.
this is the hosts response.
A: Dear Valued customer.
Thank you for the words to support
That's because you using php application and it is not return 404 error.
I tryed to type:
http://www.matchtales.com/html/dating_sdfsdfsdfs_reviews.html
and got:
Warning: file_get_contents(html/dating_sdfsdfsdfs_reviews.html): failed to open stream: No such file or directory in /hsphere/local/home/revekozu/matchtales.com/passthru.php on line 13
this is not 404 error.
With regards
...not really the issue I'm trying to address, but I've wondered before if there might be a fix for this one too.
_vlada_
Sep 1st 2005, 3:35 pm
If google say Lets all of you sitemap owners go out, dance and send picture with submited sitemap we will do that.
I think that they in Google enjoy to implement a bunch of stupid, useless things so we keep bisy.. And they, on monitors watching that and making bet how much of us will implement that in next minute..
aeiouy
Sep 1st 2005, 3:39 pm
Well I added the code the existing code like this
<?php
if (!function_exists('file_get_contents')) {
function file_get_contents($url) {
$handle = fopen($url, 'r');
$string = fread($handle, 4096000);
fclose($handle);
return $string;
}
}
include ('ad_network_222.php');
echo preg_replace ("/<\/body>/i", '<br><div class="main" style="padding-left:12px; padding-right:12px">'.
$ad_network . '</body>', file_get_contents(str_replace ('../', '', $_REQUEST['file'])));
if(eregi('GOOGLE404probe',$_SERVER[REQUEST_URI])){
header('HTTP/1.1 404 File not found');
exit;
}
?>
still get the error on verify from google. Host doesn't seem to understand.
this is the hosts response.
...not really the issue I'm trying to address, but I've wondered before if there might be a fix for this one too.
Wow.. you really need a different host.
Technically they are right it is not a 404 error. It is not giving a 404 error because they have it screwed up.
jazzylee77
Sep 1st 2005, 7:13 pm
Wow.. you really need a different host.
might be another tick against this host. Their uptime has been less than stellar lately too. Moving 10 sites with a mixture of scripts and databases won't be a simple thing for me.
ikeys
Sep 2nd 2005, 12:24 am
move the google probe code in front of the
if function_exists ,
it has an exit to prevent further display of error messages ...
the other stuff is an error because the php code wants to open a file by doing
file_get_contents
without beeing sure that the file exists
there should be an extra
if(file_exists($url)){}
but you should ask them ...
jazzylee77
Sep 8th 2005, 11:48 am
still having problems with the passthru error. I'll start another thread about it in coop ad network since its off this topic.
MKInfo
Oct 29th 2005, 12:00 am
If you are using mod rewrite the prob is in the .htaccess file.In there is a command that if a page can not be found (Error 404) it will be direceted to the index.php, giving also a 200 back to google.
The line will look something like this:
RewriteRule ^(.*) index.php
Remove that line and then verify with google.Worked for me:)
*Don't forget to re-add it to the file afterwards*
geomark
Oct 29th 2005, 3:41 am
When I try that verify link I get the following error message "NOT VERIFIED
Our system is currently busy. Please try again in a few minutes." It's been like that since yesterday. I've tried off and on all day. Anybody else get something else?
MKInfo
Oct 29th 2005, 8:04 am
I did mine just before my last post with no problems.
strogg
Dec 1st 2005, 7:03 pm
just remove rewrite rule lines from .htaccess, and try again...
S.AKYUREK
http://www.allbestwebsites.com
ltx
Mar 26th 2006, 2:37 am
do you need to put the line back on when you have done?
thanks very much, see if i can get my site verified...
ltx@linkak.com
ltx
Mar 26th 2006, 3:20 am
thanks for your post. sites verified.
i,ve founded this message in the .hacess
NOTE!
## When using multiple Joomla sites or other web applications in sub-folders,
## you must explicitly turn the RewriteEngine off or use the settings
## recommended for the application
brilliant, thank you.
ltx
vBulletin® v3.6.8, Copyright ©2000-2008, Jelsoft Enterprises Ltd.