I have messed around with a what I called a website for years, never really taking the time to learn or do it properly. Although I finally sat down and started paying attention and focusing on the details behind the pages. I have learned a lot lately and want to learn more. I have recently ran a couple of different crawlers on webpage to see what kinds of results they would come up with, and I am getting some results that I am questioning. The main one that I am having trouble understanding, is that several of the crawlers are saying most of my pages are duplicates. I have had some difficulty trying to determine what the crawlers are looking for to determine what the problem is. It is only my guess, but things like the googlebot might be coming up with the same conclusion. Can any one assist me with what the bots are looking for so that I can prevent the duplicate pages found errors? My page is located at www.stupidav.com
I didn't have time to look very deep but here's an example: On this page: http://www.stupidav.com/cool_stuff.shtml You have a collection of paragraphs that came from some other source such as: Which doing an exact phase search in Google returns 231 other sites that contain that paragraph verbatium. I suspect that many of your pages are suffering from the same problem. Try adding / writing unique text. If you are going to include boiler plate affiliate content then you definitely need to rewrite it so that you don't look like all the other affiliates. Good luck! -jay
Which crawlers did you use that identified the duplicate content? (I would like to do a similar check own my own site)
For starters the GSite Crawler but I was under the impresion that it only scans my site and doesn't look any further. Thanks to my freind for the writeup on the spyware stuff, I will rewrite that right away, sorry to all for that one, but other than the News, I know all of it is unique, because I wrote the rest of it.
Stupidav (do you have a name? I hate calling you stupid ), I looked at a few more of your pages and did searches in G/Y on randomly selected paragraphs and got no matches. Did the crawlers you used tell you what pages were dups and where the dups are located? btw you have no backlinks to the site. You should spend a little time in the directory forum here and find directories to submit to. -jay
Sorry I haven't gotten back sooner, I've been out of town. My name is Dave, although I anwser to Stupid, Stupidav, or all of the above. Since I last posted I redid the site, to save time on updating the links, by putting the SSI to work. I broke up the pages seperating the Header and footer as well as the Headlines and Links into their own pages and using the include virtual to put everything together. This has elminated the Duplicate pages issue. By doing this, it leaves me with another question, that I might need to post seperatly. Now that the pages are actual content, and the header, footer, headlines, and links are seperate, when a crawler looks at the page does it see the non server side include versions or does it see the assembled version? Thank you for the tip Jay. I have been trying to get it linked to more and more as I have time, although I haven't focused on that 100% yet, hoping to get the site right first.
Dave, The crawler will see the fully assembled version - just like an end-user's browser (minus any javascript or css formatting). -jay
Thanks Jay, I just kind of wonder why it was showing that before. Oh well as long as it is not showing it now I guess that I might be OK. Dave
Dave, I left this out. Here's a few tools you can use to verify what the spiders see: http://www.webconfs.com/search-engine-spider-simulator.php http://www.delorie.com/web/ses.cgi