One question for my knowledgeable friends : say i have a site xyz.com and it has links to 2 pages xyz.com/1.html and xyz.com/2.html . pls note 1.html and 2.html is being linked to only from xyz.com's main page and nowhere else. Now google visits xyz.com and following the links indexes 1.html and 2.html . Now if i remove links to 1.html and 2.html from xyz.com , will these pages be visited by the googlebot and indexed? Basically 1.html and 2.html are now isolated pages with no inbound links .. will hey still be indexed/visited by googlebot just becos they r already in google. thanks for reading...
My guess is if you remove the pages, the will eventually fall out of the index. But if you remove the links after they are updates, I would not see why they would not remain indexed. I mean think of them as a separate site. Let's say that there was never a link to them to begin with and you just did a "submit site" in Google.. Eventually they would spider them and they would get indexed.
the pages 1.html and 2.html were indexed once .. so its in the google index but now no one links to them .. so will it get kicked out of google? Note: pages 1.html and 2.html still exist.
it'll be out of the google. all pages in google index database are connected (otherwise, their PageRank algorithm will fail)
I agree! I don't think GoogleBot visits each and every page from it's database one after the other. I think it just follows links around to wherever they take it. So, if there are no incoming links to a particular page, it won't be spidered anymore. BUT, if it's already in the index it probably won't be dropped for a while but will disappear at some stage from the index. I am sure Google must do the odd "database cleanup" to get rid of orphaned pages, 'cos if they didn't you could essentially do the following: After each deep crawl, rename all your page. Google comes along, caches the new pages as different from the old ones due to the new names. Then, you rename all the pages again and soon ....... for a 50 page site (say) get 5,000 pages in the index after a 100 crawls. THIS DOESN'T HAPPEN. Moreover, if this was the case then the number of pages returned by the "site:" command would never go down. But, they do go down and are never more than the actual number of pages on your site! So, I believe the page would disappear from the index sooner or later. Having answered (hopefully) your question ..... why the hell do you want to do this????
They won't get dropped if they are old pages that google had for a few years. I have some orphans that are not linked to anymore that people still find in google. A pain sometimes when we don't have the item, but good when I find a replacement item with a similar description because we can link to the page again and update it and retain positioning for that page.
Google comes back to some of my pages, even though I no longer link to them.. im not sure how many times it has come back to check them, but it is long after I removed the links. Some where spidered a few days ago, and I removed the links in August / Sept last year! We have since deleted the pages, so hopefully google will start to get the hint.
I would have to agree with this. If the orphan pages are already indexed by google then they might show up in search results. eg. If 'golf_tutorial_dvd.html' is already indexed by google and someone searches for golf tutorial dvd, it might show up in search results even though it no longer has links to that page.
It will stay for a while but leave the index eventually. As said before, their whole search model depends on linkage. This is the #1 key factor. Without, you won't get spidered because your URL doesn't get added to the to-crawl queue. Initially they might reindex because of the fact it was a useful resource but they will get rid of it. Those who see it staying for longer than expected might want to have a second look. There might be some links showing in other SE's.
While cleaning up my site over the past few months, there were a handfull of pages, that I intentionally orphaned. Since that time, I have recently taken the pages off completely. What I have seen is all pages that were orphaned became supplemental results. For example: www.carrollcommunications.com/support.htm - 29k - Supplemental Result - Cached - Similar pages of http://www.carrollcommunications.com/support.htm as retrieved on Dec 31, 1969 23:59:59 GMT