The site search will still work as long as the pages are indexed. It doesn't matter if they are supplemental results. Google doesn't want your xml site map to appear in its index. xml site maps are never normally indexed.
Ok, well at least that's something. Now if I could figure out how to get old pages removed from the index and replaced with the newer ones that used to be there... well that'd be great. Does anyone have any idea how to make google deindex pages that had already been deindexed before? Or even more importantly, how to put the new pages BACK in the index? Do I need to sacrifice a chicken or something?
you can file a delisting request. http://www.google.com/support/webmasters/bin/answer.py?answer=35301&topic=8459 Or you can simply pick up the incorrect pages and 301 them to new pages.
well, David, one thing Id like to say is that you should remove the PHPSESSIDS, they prevent proper indexing seeing how session ids are generated randomly each time u visit that site
Where do you see session IDs? That problem was fixed as far as I know some time ago. If you're seeing them in Google, they are old pages. If you're seeing them as a Guest, that's OK. If you use Firefox and spoof Googlebot, you should see them disappear.
the strange thing is that ive only clicked on the link in ur post. first time i visit the page i get the session id. then if i refresh the page once i dont anymore if i try visit the page again (with that browser at least). I did this in Fx and then realised that I should of taken screen shots so i did it in IE. this one has the sessionids this one doesnt Im not entirely sure to make of it, but maybe this has something to do with ur problems? Eg, the Googlebot is being served an old cached version of the page each time it visits, and as its not 'refreshing' the page then its not being given the new one? :/ sounds really weird/stupid but hey, stranger things have happened Howfully my encounters with this was just because i was a Guest or something...
Oh, I see. That "Hot Topics" page is something I just created yesterday (my mini site map) so it may have the problem. However, spiders actually visiting the forum will have the session IDs disabled. That should be true even if they follow the links from that page. In any case, that's not related to the Google indexing problem. Prior to Big Daddy, with the older version of SMF, Google indexed pages just fine WITH the session IDs. After Big Daddy, it's not indexing anything but the index page and then yesterday that old orphaned "Hot Topics" page which presumably came from following a link from other ole nonexistent pages in the Google cache.
Oh that's great. I hadn't even considered that. So not only is google reindexing old pages that don't exist, but of course they're also following the links on those old pages. Which of course will go to more non-existent pages and more broken links. etc etc... Google's gonna end up thinking half of the links and pages on my site are busted. I'm sure that'll help me in the SERPS. Ouch.
Yep. Check your log files. I have page after page of 404 errors for pages which haven't existed since last year. Googlebot has Alzheimers.
It has gotten to the stage where googlebot is now spidering and indexing the 404 page errors I am stunned as these pages are returning 404 headers status yet google is spidering and indexing the custom 404 page (and of course sending the page supp for duplication.
I would ntot so much say it is broken, but rather that it is full. how many people have suffered with a clogged up hard drive? everything G R I N D S to a S L O W pace. something has to give. Eric Schmidt has openly said google is full and that they have a crisis. I think this is a result of some truncation within the algo somehow. I am not a coder but I just have this gut feeling that we have probs. They said that BD was meant to deal with cannonicalisation issues. Yet we see 404 pages being spiders which, let's face it is the sort of scripting a 6 year old does. it is like going to the milk store and picking up the empty bottles from the place marked EMPTY ! apart from the fact that the tops are off, you can see through the glass, and they do not weigh anything, it is an easy mistake to make
The bigger the machine gets the harder it is to keep it a well oiled machine. This goes from the smallest computer part to the thousands of lackys who now probably work for Google in some capacity. Sure they have brain power, but along with this comes politics, lazy people, etc. The more hands in the pot, the messier it gets.
The most annoying part of it to me at the moment is the public strategies Google is using to address the issue, which amounts to: 1. we had a few minor problems caused by black-hat SEO sleaze-balls but it's fixed now 2. we had a few minor problems like Sitemaps not being able to handle the trailing slash but it's fixed now 3. *desert wind with tumbleweed sounds and crickets*
1. listing pages that haven't existed for a year 2. following links from those non-existent pages to other pages that haven't existed for a year 3. not listing pages that have existed since those non-existent pages ceased to exist 4. trying to spider URLs that never existed
5. ranking pages in the supplemental index, that also haven't existed for a year, ahead of the home page