Having had several pages sitting with default holding pages for quite some time (from several months to over a year), I've recently developed a couple of domains and am finding problems getting them indexed by the search engines. I've laid a couple of links in, but still they seem to be holding off doing the indexing. They are just looking at a few system pages, and for the robots.txt file. Now I'll have to go the sitemap route which I prefer not to do as I think that sites should be spidered naturally. It makes some sense, if a spider comes a bunch of times and nothing changes we know they slow down visiting; likewise if a spider comes and not only nothing has changed, but also it's a generic holding page, they'll eventually stop trying to do much of anything. But what doesn't make sense is why Google keeps coming back and trying to access p/w protected areas (/personal/) and are ignoring the links in to fresh content. You can see here Slurp, not doing much despite the site now containing documents 74.6.27.77 - - [03/Jul/2007:11:20:13 +0100] "GET /robots.txt HTTP/1.0" 404 645 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)" 74.6.26.160 - - [03/Jul/2007:11:20:17 +0100] "GET / HTTP/1.0" 200 5369 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)" 74.6.73.70 - - [04/Jul/2007:04:39:11 +0100] "GET /robots.txt HTTP/1.0" 404 645 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)" 74.6.69.110 - - [04/Jul/2007:04:39:20 +0100] "GET / HTTP/1.0" 200 6620 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)" Code (markup): And similarly Google have been popping by every few days but not indexing anything new 66.249.65.161 - - [30/Jun/2007:00:16:16 +0100] "GET /robots.txt HTTP/1.1" 404 657 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 66.249.65.161 - - [30/Jun/2007:00:16:16 +0100] "GET /help/ftp.html HTTP/1.1" 302 244 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 66.249.65.161 - - [30/Jun/2007:17:52:53 +0100] "GET /robots.txt HTTP/1.1" 404 657 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 66.249.65.161 - - [30/Jun/2007:17:52:53 +0100] "GET /admin/ HTTP/1.1" 302 265 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 66.249.65.161 - - [03/Jul/2007:09:42:49 +0100] "GET /robots.txt HTTP/1.1" 404 657 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 66.249.65.161 - - [03/Jul/2007:09:42:49 +0100] "GET /personal/ HTTP/1.1" 302 246 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 66.249.65.161 - - [03/Jul/2007:17:18:01 +0100] "GET /robots.txt HTTP/1.1" 404 657 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" Code (markup): The lesson learned from this for me is to add content to your sites, even if they are justing waiting for that moment to do some real work. I'm going to add RSS feeds into a couple of holding sites now and monitor bot activity prior to doing real development work on them.
The sitemap may well have worked as the spiders are now visiting, one site has been fully indexed and the other is a third of the way. I've added a RSS feed to a third site (using a couple of key terms from the niche) and will be monitoring it for bot activity. When I eventually have the site ready, it should be getting indexed reasonably frequently and will make the real launch more effective.