Hi, I recently started a directory at http://www.answers.co.in/ about a week back. The directory uses data from ODP. I wasn't too hopeful that search engines would index too many pages from the site because of all the duplicate content. However, I was surprised to find that Google has indexed 42000+ pages (Yahoo and MSN haven't picked it up yet, however their crawlers are crawling the site). I didn't get too many links pointing to the site...just bought a couple. Is it possible that Google has indexed the site because of the country domain? (I checked some other ODP directories WITH PR and they have approx 500-2000 pages only indexed). Ajay. P.S.: I have been getting some traffic as well - from google.co.in (a pleasant surprise).
Congrats. I have heard that country-specific Google indexes sites more indepth for that country. However, I do not have any evidence of this myself or any first-hand knowledge. On a related note, is your site going to stick to being ODP only, or once you get going nicely, are you also going to offer submissions, perhaps for a fee?
Yes of course, I shall be going in for submission, once I get decent traffic and some PR. But that is still some months away
Oh thats pretty easy ... in fact there is an entire category in ODP listing scripts that do this. Check out: http://dmoz.org/Computers/Internet/Searching/Directories/Open_Directory_Project/Use_of_ODP_Data/Upload_Tools/
Update: Now Google has indexed more than 60,000 pages. However, Yahoo and MSN haven't picked it up yet although their crawlers did come to the site. Maybe they spotted the duplicate content? Or do they take time to update their index?
Gee thanks a lot honey...however Yahoo hasn't picked it up yet...guess it saw through the duplicate content thing :-( MSN has 180 pages indexed.
Maybe Google has a "quota" for # of sites it wants to index per county (like sites per capita or something), so they could be more aggressive for .in where there aren't as many sites vs. ".com". Just a thought. Interested to hear if anyone else has observed this on .us or .uk domains. LC