Hi Dpers, What are the efficient methods to beat the google duplicate content filter for a directory like this : http://www.google.com/search?hl=en&q=site:www.directoryone.info&btnG=Google+Search I believe in couple of months all pages will be dropped and there will be nothing left in index. Hence requesting suggestions on beating the dup filter somehow. Thanks in advance.
You cannot beat the dup detection because it is query specific and operates on parts of the text. In the final ranking phase google looks for the query keywords in the resulting pages. Google extracts snippets of text that contain the keywords and matches them to the snippets from the other pages in the SERPs. When there's dup content (based only on snippets, not whole docs), google leaves only the one page that it considers most authoratie (oldest, highest PR sth like that). Most directory submissions are dup text. The way to beat that is: have different categories which would distribute the listings into different pages (that way keywords that match different listings won't get filtered out) OR edit the directory listing descriptions yourself (that's a lot of work).
Thanks nohaber, But I'm looking for other methods like inserting random rss feeds, inserting some unique texts etc. But I'm not sure how much effective they can be. Regards, Tuning
Nope this is not a dmoz copy, but looking to site: command I'm not seeing any descriptions to indexed pages.
the same happened with one site of mine as well i don't know why google can't take up descriptions while all the tags are in their place and all seem to be working fine.
yes thats ok, but without a keyword, it gives such blank pages which shows that these blank pages exist as well on google which is a bad thing to see
I have had great succes getting all pages of a directory indexed by means of RSS. The directory itself has about 5000 categories and only 144 links are placed. Therefore most pages used to have no content at all and therefore were not cached. What I did was to use random RSS feeds, but most of them were related to the subject. Like all categories under continent Africa have random RSS feeds about Africa, categories under Asia have random RSS feeds about Asia, etc. This way the text onpage is somewhat relevant and now more than 6.500 pages are really cached, by looking with the API. Normal results show 35.000 pages, but not all of them are cached according to the API. In short, random RSS feeds did wonders for my site! That and unique titles without any keyword/description tags. I noticed your keyword/description tags are the same for all pages, better to leave it out entirely if you cannot make them unique.
When you use site:www.domain.com without keywords, you only specify where to search, but don't supply a query. Google handles it strangely and returns weird results.
You can certianly beat dup content filters - But it's a case of whether you want the copy to make sense or not
I couldn't agree more and in fact it is essential to use this technique in a new directory of you are using AdSense. This is because your empty categories are in violation of the TOS but if you add content dynamically, you are fine.
Does that mean changing all category names in singular to plural ? Anyway I will add dynamic rss feed to check how it works. Thanks.
First what I would to is create a rewrite module to the directory. And then create for every category a description and include the description to the pages and meta. That way you always have a description on the pages. I did it with my directory http://www.dirspace.com/ it's a lot of work but really pays off. About 3800 pages are indexed. Also put your <head></head> into the top of the html page then insert the table with picture and adsense code.
Hey, just a question (off the top of my head): what is the page rank of http://www.yourdomain.com/ (it looks to me like being 0) and why isn't the lucky owner of this site doing something about it? How come a such popular site (has inbound links all over the place) is not taken advantage of?