View Full Version : Google will only index main page.
cormac
Oct 13th 2005, 8:35 pm
Googlebot seems to be hitting my site like there is no tomorrow but my Main page is the only url to be indexed.
Now reading in another post someone suggested to submit each Url of the site with Google and each page will be crawled. 1: Is this not seen as Spamming Google? and 2: Isnt the Sitemap supposed to tell the bot what pages are linked together?
I have Google Sitemap setup and the site has been verified but the Stats doesnt show for the site - anyone know anything about this? :(
I appreciate your time!
mdvaldosta
Oct 13th 2005, 8:37 pm
It takes a while for G to update sometimes. What site are you referring to btw?
minstrel
Oct 13th 2005, 8:41 pm
How Do I Get My Site Listed on Google? (http://www.google.com/webmasters/1.html)
A. Submit your URL
1. The basics
Google is a fully automated search engine, which employs robots known as 'spiders' to crawl the web on a monthly basis and find sites for inclusion in the Google index. Since this process does not involve human editors, it is NOT necessary to submit your site to Google in order to be included in our index. In fact, the vast majority of sites listed are not manually submitted for inclusion.
Google does not accept payment for inclusion of sites in our index, nor for improving the rank of sites in our results. We do offer adjacent to our results, which are always clearly labeled "Sponsored Links." The method by which we find pages and rank them as search results is determined by the PageRank technology developed by our founders, Larry Page and Sergey Brin.
2. Submitting your site
We add thousands of new sites to our index each time we crawl the Web, but if you like, you may submit your URL as well. Submission is not necessary and does not guarantee inclusion in our index. Given the large number of sites submitting URLs, it's likely your pages will be found in an automatic crawl before they make it into our index through the URL submission form. We DO NOT add all submitted URLs to our index, and cannot predict when or if they will appear.
Please visit our "Add URL" page to input your URLs. You can submit your site as often as you like, but multiple submissions will not improve the likelihood of your site being added or accelerate the process. We do not penalize sites for 'over-submitting'. If you choose to submit your site, only the top-level domain is necessary, as the spiders will follow your internal links to all the rest of the pages.
The best way to ensure Google finds your site is for your page to be linked from lots of pages on other sites. Google's robots jump from page to page on the Web via hyperlinks, so the more sites that link to you, the more likely it is that we'll find you quickly.
B. I've submitted my site to Google and it's still not listed. Why?
1. The 'Add URL' form didn't work.
Google finds sites through a process known as "crawling" the web. This involves robot software that follows hyperlinks from site to site. Google currently looks at more than 3 billion URL's during the crawl. The process may take several weeks to complete.
When a URL is submitted to Google, we look for it in our next crawl. If you've already submitted your URL, your site could easily appear in our new index, which will go up when the current crawl is completed. However, if no other site links to yours, it may be difficult for our crawler to find you. Conversely, if many sites link to your page, there is a good chance we will find you without your submitting your URL.
Occasionally, websites are not reachable when we try to crawl them because of network or hosting problems. When this happens, we retry multiple times, but if the site cannot be crawled, it will not be listed in our current index. If it was a transient problem, the site will likely show up in the next index, which will be completed in a few weeks.
If we have not picked up your site and it has been several months, then it is likely that our spiders are not able to find your site. If you increase the links pointing to the page, Google will likely find your site in the future.
2. What else can I do to get listed in Google?
If you are having difficulty getting listed in the Google index, you may want to consider submitting your site to Yahoo! or Netscape. You can submit to Yahoo! by visiting http://docs.yahoo.com/info/suggest/. You can submit your site to Netscape's Open Directory Project (DMOZ) by visiting www.dmoz.org. Once your site is included in either of these directories, Google will often index your site within six to eight weeks.
cormac
Oct 13th 2005, 9:07 pm
What site are you referring to btw?
The site on my sig.
minstrel thank you for the information - going to boil the kettle now and have a wee read :D
cormac
Oct 13th 2005, 9:25 pm
Thanks for posting that - it has confirmed a few things for me.
I guess I'll have to wait and not waste time stewing over it and for now keep the site fresh and build on good backlinks.
Again thanks for the info, most helpful.
randymorin
Oct 14th 2005, 8:18 pm
Suggestions
In your HTML
<meta name="robots"content="index,follow" />
put a space between the two attributes. That can confuse agents.
Your XHTML is not valid (http://validator.w3.org/check?uri=http%3A%2F%2Fwww.cd-burner-help.com%2F), not well-formed.
When did you activate you domain? You domain can be sandboxed for half a year, if it's new.
minstrel
Oct 14th 2005, 8:56 pm
No, the space isn't needed, but there are certainly things in your <head> section that you don't need or are structured incorrectly:
<meta name="refresh" content="60" />
Why is this even there? It doesn't seem to have a point on the page. And it should be <meta http-equiv="refresh" content="60"> if you truly need it. It MAY hinder spidering if the page actually changes with each refresh. And if the page doesn't change, it shouldn't even be there. Not opnly that but you have two of them on the page...
[code]<meta name="robots"content="index,follow" />
You also have two of these. Neither is necessary. All you are doing is saying, "Spiders! Pay attention, please! Do what you were already going to do natrually as a matter of course before I interrupted you! Thank you! That is all! As you were!"...
<meta name="copyright" content="Copyright 2003. Mucker Group. All Rights Reserved." />
<meta name="author" content="Mucker Group" />
Put this on the page, not in the headers. Waste of space.
<meta name="revisit-after" content="7" />
A frequently misunderstood meta tag that in most cases should never be there. If it is even paid attention to, this functions as a limiter. What it says is NOT come back often but DON'T come back often - you're saying, "No point in returning here for at least a week because nothing on this site changes anyway". Delete it.
Your sitemap link points to an XML page. This shouldn't be in Google sitemap format (you do that differently). It should point to an HTML page which will contain a list of the most important pages on your site. Make the links on sitemap.html text links for sure spidering.
Some of your tags are misplaced or missing, as here:
<table width="780" border="0" align="center" cellpadding="0" cellspacing="0">
<!--DWLayoutTable-->
<tr>
<td height="68" colspan="6" valign="top"><table width="780" border="0" cellpadding="0" cellspacing="0">
<!--DWLayoutTable-->
<tr>
<td width="717" height="68" valign="top" class="banner"><p><img src="banner.gif" alt="CD Burner Help Online Technical Support" width="338" height="68" /></p></td>
<td width="11"> </td>
</tr>
</table></td>
</tr>
<tr>
<td height="200" colspan="6" valign="top"><table width="728" border="0" cellpadding="0" cellspacing="0">
<!--DWLayoutTable-->
<tr>
Note the stray </td></tr> after the closing </table>. That is likely going to create problems for interpreting the code that follows.
Your page doesn't have to validate for W3C (spiders don't give a damn about that) but it does need to be error free or spiders won't be able to decipher it, even though IE is pretty forgiving.
This tool will help simulate what Googlebot will see: http://gritechnologies.com/tools/spider.go?q=www.cd-burner-help.com
This looks like a Dreamweaver template. Use the code validator in Dreamweaver to find errors.
Try these changes for starters...
cormac
Oct 14th 2005, 9:44 pm
Thank you both for replying, I appreciate this.
Minstrel you are correct that I am using templates in Dreamweaver - The code of the page is something new to me so I have been burning lots of hours trying to learn it - your tips are extremely helpful.
As for the refresh tag there is no reason why that is there, I may have placed it on with one of the tools I use - it will be removed.
The robots follow tag must be again from another tool, they will be removed too.
Copyright tag will be removed as this is on the page as you suggested.
The revisit tag is something I misunderstood and I did play around with the revisit rate, I'll remove it.
The sitemap is something new to me and I did have it in html before xml - I'll change this round again.
I'll go over the code and try and fix anything out of place.
Thanks for the link I'll read over it fully and let you know how it all works out.
Thanks very much.
vBulletin® v3.6.8, Copyright ©2000-2008, Jelsoft Enterprises Ltd.