I am lost here as I do not remember ever seeing this occur. Has anyone else??? I have a new client with 810 cached pages in Google. Problem is they are behind a secure URL Everyone of the URLs Google returns starts as so: [SIZE=-1]https://www [/SIZE][SIZE=-1]https:/.../index.php Any idea whats going on??? Why would Google cache https ???? [/SIZE]
http://www.google.com/search?num=10...:2005-25,GGLG:en&q=allinurl:https&btnG=Search Google indexes tons of https pages.
I see Well the problem is these pages dont exist anymore on the server... and the client wants the http:// pages cached Can I use the URL removal tool safely do you think??
I had the same problem awhile back. Someone had linked to an https page from another site and that caused all the https indexing...so I blocked https pages from being indexed. I never used the url removal tool but after a few months the https pages dropped from the index.
From what I am learning now in the past history the original designer / dev person submitted the https pages as that is how the entire site was set up..... Did you block the https through robots.txt or .htaccess? Thanks for the help & info
If I recall correctly (it's been awhile) it was with robots.txt. Also did a 301 redirect - all https to http. That way if anyone linked to the https again in the future it wouldn't be an issue.
google search engine is screwed up. They havent fixed most of it since BigDaddy. Think yourself lucky it spidered your site. If do alot of google search like me, you'll find it littered with dead links (caching dead links aswell), sites which had 100,000 pages indexed and now have 10, and the supplementary search of death, where you have 1 page listed and then 99,999 others are supplementary's Google needs to fix their SE before going onto to other markets and products.
^ i thought it is linked. (well except the last line, which went off in a bit of a random comment/rant.) The same problem which is causing the problems i mentioned, may also be the reason for google spidering pages you would not think it would normally.
Google will index any url it finds unless it's specifically being told not too. Take the proper steps on your site to control what's indexed and what's not and you'll be set. I would see it as a big problem if there were not relatively easy steps that webmasters can take to control it.