I am looking for a way to search websites with "no index" in their html, as those would be pages that Google doesn't crawl, right?
If they are not in Google index, there is no way to search for them in Google. I'm not familiar with any tool that would crawl sites independently and then compare with Google index.
That would defeat the idea of noindex. That's a directive to keep Gogle from indexing it to begin with, Google can't search what it doesn't index.
Yes, but Google still shows Facebook pages and Facebook is nofollow, though this may just be the outer pages. Google sees this information already. It runs into "noindex" & "nofollow" tags, so it's safe to assume that there would be a way to only show pages that it hits and can't crawl. For instance, many forums are "noindex" and there are search engines that only crawl "nofollow" forums. I think Google would crawl "noindex" websites, but just don't show the pages in search results. Besides, not all bots and crawlers obey meta infomation, htaccess, and robots.txt. I have had 40+ Bingbots on a site before, because bots that spoof can be hard to track-down
I want to make things clear, there is not any meta tag which stops crawler from crawling the website. So even a page has meta nofollow and noindex tag, Google will crawl it. But as crawler sees nofollow, it stops crawling further pages which are linked from that page. Again which means, the pages which exclusively linked through that page won't be crawled. Otherwise it they are linked from other page, it will be crawled and indexed. And noindex directive suggest crawler to not index that particular page. So if you want to search it in Google, you won't find it for sure. For facebook, there is some mistake on your side. Facebook profile pages are not set to noindex. They are indexed and can be found on Google.
Yes, you can got to http://publicwww.com/ and search the term "noindex" it comes up with hundreds of thousands of results.
Is it possible that Google is indexing something that is already on a nofollow? As far as I believe that those pages were indexed only by the times the no follow tags were posted. Afterward Google crawler would've stopped crawling things further.