Hmm, this seems very interesting.. nice way to get those hard to find pages indexed. Does anybody have any data that shows if submitting a site thats already in the index will get a higher ranking after doing a sitemap for it? I.e. would submitting a site thats already fully indexed be helped at all. Josh
I can't see how it would make any significant difference at all, Josh... once googlebot has found you, devours all your pages, and returns daily for more (often in groups), what more can you expect a sitemap to do?
And when it has already spidered, indexed, and reindexed all your pages through several updates, then what? I do realize that for sites like this forum or for a storefront, there are always new pages being created -- it may make a difference there when the number of pages reaches a threshold. For an average information site or ecommerce site, I don't see it making a lot of difference.
Yeah my site has several pages added daily, and a lot of them update as well. Im still a PR0 missed the last update, so Googlebot wasnt paying me much attention. After putting the sitemap up I think it has indexed everypage on there, and cached almost all of them!
For anyone that needs it, I wrote a small tutorial on how to create a Google sitemap for a large site using Xenu LinkSleuth and MS Excel. Check it out: http://www.ethangiffin.com/archives/2005/06/07/08/45/59/ Regards, Opie
Minstrel, The one advantage I see of this is if you have a homepage and a few other pages that have not been updated in quite some time, and Google has stopped visiting your site (this has happened to me) but you have added additional internal pages, you just add them to a new sitemap and submit them. Google has now crawled and cached those new pages and they have returned and updated the cache on the homepage as well.
Yes and no, with the priority and the rest, it is a little more... remember Min, this is beta, I am sure these guys are not going to tip their hat at the start as to the true reason behind this. If we know the guys at G (and we don't) this will probably turn out to be something pretty cool and useful... it already has been for me...
The thing is, as I said on another forum, Google's track record doesn't suggest that they generally go out of their way to make it easy for webmasters, given that from the viewpoint of relevant search results webmasters might often be considered to be the enemy. I think it was noppid who wondered publicly what the catch was and that bears some thiking about... I look at the features (including priority and last-modified-date) and I'm just waiting for the first bright light bulb to start experimenting with scamming this -- for excample, has anyone listed a page yet as anything less than priority 1.0? How useful is that really going to be for Google? If Google is looking at this as a way to find pages it doesn't know about, I can see why it might be seen as helpful to them. But I think there is little doubt that webmasters are going to at least try to exploit this one way or another and I wonder how Google is going to deal with that. I have always been impressed by the intelligence in the Google group. I'm waiting with bated breath to see how this one turns out.
Maybe a bit offtopic, but if people are having trouble with the XML, the faq says you can submit just a list of links, divided by newlines (\n in most languages). That's a very easy way to do it. It's hard to tell if it has improved it's crawl coverage, since it already had most of me indexed, but i believe it has added more because of the sitemap, but I cannot be certain, so could just be wishful thinking.
no space, just : http://www.site.com/page.html\nhttp://www.site.com/page2.html\nhttp://www.site.com/page3.html\n Which, when viewed through a browser as say, sitemap.txt looks like : http://www.site.com/page.html http://www.site.com/page2.html http://www.site.com/page3.html
further update, google is all over my site, has cached basically every page. Google now downloads my sitemap about 2 hours after it is posted every night at 4am. I'm a fan.
What about if you were using an html extenion on the same page, any difference? Is the \n still recognized?
Google downloaded my sitemap and is basically doing nothing much. Crawls like what it has been doing for the past weeks .. crawling like the robot is dying. Is it because my site is new (~ 3 weeks old) that the bot has been reluctant to be aggressive eventhough i have submitted my sitemap? Is it the sandbox effect again? Duh! Why can't they just get rid of this sandbox for sites with significant no of pages ( like 50 and above) ? Anyway, for ppl who use Linux/Unix hosting..you should just try the sitemap generator recommended by google. It's a snap to create a sitemap using it. The beauty is also in the fact that you do not need to recreate it and send it to google when the pages change. Use the cron job to auto-magically do it .. I love this tool For ppl using windows server..hmm..either do the xml pages by hand (bad idea for big sites) or use third party tools mentioned here. The problem is you need to do it again (create, submit) everytime you add pages. A hassle ... Oh! Linux - hard to learn but easy to use...thank god i use linux server
I think it does help very much! I got all my pages of my sites indexed after Google downloaded my Sitemaps
Iskandar, Didn't try that, but I think so. googlebot should just see it as a bunch of urls separated by newlines. It looks different in the browser when it has a .php extension, I tried that. But I think it should work as well. It looks like a bunch of urls with a space in between them. But that's just how the browser views \n in .php (and probably .html). At any rate, if it doesn't work, google will tell you. =)
That's what ppl have been saying. I think the sitemap is valuable for sites that have been "there" for quite sometime. I think the sandbox effect still applies - even after you sent the sitemap. Anyway, i think it's better to send the sitemap than not doing anything .. For those who have succeded .. congrats!