While checking through the indexed pages of my site, I found lots of urls with https:// are indexed by Google. It is creating content duplicity as I found two cached version of the same page, one with http:// and another with https://. I checked for the reason and discovered that there are some links present in my site with https:// . I can't stop posting of such urls as I don't have any control over my visitors. So my questions are: 1) How do I stop google from crawling the https:// urls? 2) How do I remove the urls that are already indexed with https://? Please help me with your valuable suggestions asap. I'm in need of it.
Go to http://www.google.com/webmasters/ you can find all kind of stuff there or else you can block these urls using robots.txt and .htaccess
r u Mitra from India in V7N forum? Same question in many forum? Searching google with this keyword: "stop crawling search engine https" http://www.google.com.bd/search?q=stop+crawling+search+engine+https http://www.v7n.com/forums/google-forum/76615-how-stop-crawling-https-urls-google.html http://www.webproworld.com/google-discussion-forum/65572-how-stop-crawling-https-urls-google.html with good answers: http://www.seoworkers.com/seo-articles-tutorials/robots-and-https.html http://groups.google.com/group/Google_Webmaster_Help-Indexing/browse_thread/thread/c08092000ee3273b http://www.webproworld.com/google-discussion-forum/48472-new-https-cannonical-problem.html http://www.webproworld.com/internet-security-discussion-forum/49940-http-https-problem.html
moinuddin102, thanks for providing me with the helpful urls. yes, i'm the same person. will it be a problem? My intention is only to get the best answer from experienced webmasters like you. I do this to document my queries with all the answers in my notebook for future reference.
http://www.seoworkers.com/seo-articles-tutorials/robots-and-https.html this link was great for me. Implemented the redirection of robots_ssl.txt through .htaccess. It worked fine for me.
How do I remove the urls that are already indexed with https://? I heard google can only remove the 404 pages. I can't make those pages 404 as those pages are coming from the same section as normal pages.
i'm using Google webmaster tool for removing https:// urls. I've already added with http:// and now I'm going to add http://. After adding http://, if I remove entire site from removal tool, only the https:// url will be deleted. It won't affect the normal urls with http:// . I'm bit nervous.
I've submitted 63 https://urls in google webmaster url removal tool . It has been 24 hours since I posted those urls. How long it will take to remove those urls? The status is showing as pending. Analyze robots.txt section shows that the robots.txt last downloaded 17 hours ago.