Does anyone know what effect https would have on page rank? Much of my SEO tools don't display much, if anything, when a given page is displayed using the https. Just curious.
Yeah, I was thinking the same thing that spiders may have difficulty crawling, although I can crawl it using various sitemap software. Hmmmm....
If it is password protected then the spider will not index it. Usually you see https pages on secure pages during e-commerce checkout pages and secure forms on banking websites.
What information are you talking about? The thread is about Pagerank, which https URL's can get... Mine certainly did. It won't be cached, but will display backlinks and Pagerank.
Google does crawl SSL, at times it will index and display ssl pages instead of the non ssl pages on my ecom sites.
Yes it will index https, if you have a https version of your site you should always exclude Google from it as you will end up with big duplicate content problems.
Never had a duplicate content problem, plus last time I looked there was no real way of excluding just ssl. Is there such a code now? Either way never had a problem...
Very good advice! I do have the .htaccess set to redirect to https so the spider and visitors will continue on the https path. I found something on Google about it too.
The above only works if you use different directories or am I wrong? All of my ssl enabled sites use symbolic links, same content in each folder. I am unable to server different robots.txt
Common SSL uses 2 different directories so in a general setup it would be no problem to have 2 different ones. I however use symbolic links so it uses all the same content out of one directory, just much easier.. I'm sure there is still a way, maybe an .htaccess referral code, it detects ssl and forwards to a different version...If I get time and am not overly lazy I might look into it
In order to add a website URL to its index, a search engine must be able to access the site. Accessibility roadblocks are technologies or page elements past which a search engine spider cannot crawl. Robots exclusion and redirects are also important ways to manage how search engines access and index websites. * Dynamic URLs and querystrings: URLs containing querystring elements such as & or ? to dynamically retrieve data may not allow access to search engine crawlers, so the content of web pages with dynamically generated URLs may not be searchable. * Secure sockets layer (SSL): Search engine crawlers are unable to access web pages encrypted using SSL protocols. * Javascript: Search engine crawlers do not follow links or page navigation written using javascript. * Cookies and session IDs: Search engine crawlers do not accept cookies or work with session identifiers. Web pages requiring cookies or session IDs for access will not be searchable. * Roadblocks (Lynux browser/viewer, firefox browser) * Spider limits: Most search engine crawlers limit the page size or number of characters they will crawl. Decrease the size of large web pages by moving javascript and CSS to external files. * Broken links: Search engine crawlers don't crawl past broken links. * Sitemaps: A sitemap is a web page that lists and links to all of the pages of a site. Search engine crawlers can easily and effectively index a site using a sitemap. Sitemaps are especially useful for sites with content that is otherwise inaccessible due to dynamic URLs, SSL, or other roadblocks. Limit the number of links in a sitemap to fewer than 100 or build sitemaps around groups of pages. * Non-HTML documents: Documents such as Word, Excel, PowerPoint, and Adobe PDF can be indexed by search engine crawlers. Assign a metadata title to Adobe and Microsoft documents using the File>Properties dialog. * Canonical URLs: A search engine will consider http://utah.edu and http://www.utah.edu to be different websites. If both URL forms serve up the same pages, search engines will consider them to be duplicate content, and dramatically deduct the relevance score of both. Use Server (301) redirects to point alternate form URLs to the canonical URL without relevance penalties. * Redirects: Redirect instructions tell web browsers and crawlers to move on to a new or revised URL. Server 301 redirects are server-side permanent redirect instructions which search engine spiders will follow. Server 302 redirects are server-side temporary redirect instructions, and most search engines will ignore them. Meta-refresh and javascript redirects are often used unethically to "cloak" content, and most search engine crawlers ignore them. * Robots exclusion: The Robots Exclusion Protocol is a method that allows site administrators to indicate to visiting robots which parts of their site should not be visited. Robots can be specifically admitted or excluded on a site-wide, directory by directory, or page by page basis, using the robots.txt file or robots meta tag. I hope this solves your problem. Regards!