1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

Sitemap Size

Discussion in 'Google Sitemaps' started by nevetS, Apr 14, 2005.

  1. #1
    I just built a sitemap, and it's 1.8 Megs. I've heard to keep it under a certain file size, but I can't remember if it's 100K, 150K or what. I'm assuming that the thing to do would be to split it up into several files and cross link those pages.

    What size do you recommend?
    SEMrush
     
    nevetS, Apr 14, 2005 IP
    SEMrush
  2. dkalweit

    dkalweit Well-Known Member

    Messages:
    521
    Likes Received:
    35
    Best Answers:
    0
    Trophy Points:
    150
    #2
    Google's official webmaster guidelines said no more than 100 links per page, last I checked...


    --
    Derek
     
    dkalweit, Apr 14, 2005 IP
  3. nevetS

    nevetS Evolving Dragon

    Messages:
    2,544
    Likes Received:
    211
    Best Answers:
    0
    Trophy Points:
    135
    #3
    I checked and it does say that, but I'm not sure that applies to sitemaps. With 3000 pages that puts me at a 31 page sitemap which I think is a little too large. Does anyone else have any input?
     
    nevetS, Apr 14, 2005 IP
  4. kyle422

    kyle422 Peon

    Messages:
    290
    Likes Received:
    8
    Best Answers:
    0
    Trophy Points:
    0
    #4
    Make more than one sitemap. Keep it under 100k. Even though I've read that G00gle all but ignores anything after 100k, I've seen sites ranking very high on G00gle that are well above 100k.
     
    kyle422, Apr 14, 2005 IP
  5. WhatiFind

    WhatiFind offline

    Messages:
    1,789
    Likes Received:
    257
    Best Answers:
    0
    Trophy Points:
    180
    #5
    With big sites I've seen many sitemaps that are divided into numbers or letters, like 1, 2, 3, 4, or A, B, C, D,..

    just a thought: If you have your site map in html, import it in exel set the cells to reorganize all the urls alfabeticly then copy the links for each letter back to html in single pages naming them A, B, C, D and so on. maby this will work for you.
     
    WhatiFind, Apr 14, 2005 IP
  6. minstrel

    minstrel Illustrious Member

    Messages:
    15,082
    Likes Received:
    1,243
    Best Answers:
    0
    Trophy Points:
    480
    #6
    The suggested limits are indeed 100 links and 101kb -- the 101kb, however, is text of course since spiders don't read images -- beyond 101kb of text, I wouldn't count on the spiders reading anything.

    As for the site map, anything that huge isn't being designed for humans, obviously, and although Google isn't clear on the issue I wouldn't want to bet that Googlebot will follow 3000 links. My recommendation would definitely be to organize it into categories of no more than 100 links per category -- another advantage of this is that each page links back to the home page and each page, if correctly categorized, has an opportunity to acquire PR from other pages.
     
    minstrel, Apr 14, 2005 IP
  7. spdude

    spdude Guest

    Messages:
    1,315
    Likes Received:
    86
    Best Answers:
    0
    Trophy Points:
    0
    #7
    I've seen site maps with over 3,000 links on them helping in quickly getting a site indexed in a couple of weeks. Even the links at the bottom of the site-map got crawled. I've seen this many times.
     
    spdude, Apr 14, 2005 IP
  8. minstrel

    minstrel Illustrious Member

    Messages:
    15,082
    Likes Received:
    1,243
    Best Answers:
    0
    Trophy Points:
    480
    #8
    Really? Can you show me one?
     
    minstrel, Apr 14, 2005 IP
  9. spdude

    spdude Guest

    Messages:
    1,315
    Likes Received:
    86
    Best Answers:
    0
    Trophy Points:
    0
    #9
    spdude, Apr 14, 2005 IP
  10. minstrel

    minstrel Illustrious Member

    Messages:
    15,082
    Likes Received:
    1,243
    Best Answers:
    0
    Trophy Points:
    480
    #10
    Okay. I see a page with a lot of links with slow load time. I also see that Google has cached the entire page. I also see that the page (text only) weighs in at a whopping 615 kb.

    The fact that it's cached does rather suggest that the conventional wisdom of Googlebot stopping at 101 kb is no longer true. That alone is a bit surprising...

    However, even if every page listed on this one has been indexed by Google, whether or not every page linked from this page was spidered FROM this page is another question -- one that can't be answered based on this information alone.... so the question of how many links Googlebot will follow from a single page remains open.
     
    minstrel, Apr 14, 2005 IP
  11. spdude

    spdude Guest

    Messages:
    1,315
    Likes Received:
    86
    Best Answers:
    0
    Trophy Points:
    0
    #11
    The 101 kb thing is a myth I would say. It all depends on how much PR you point at the site map. If you point a few PR5 or PR6 links, the bot would crawl all the links on a huge page like the one I posted above also.

    I can show you an example of an amazon site which got 25,000 pages indexed in two weeks from a site map created on another site which was a high PR6. The site-map from the other site drove the bot to deap crawl the amazon site in days. It was insane how fast it happened.

    The links on the site map were the only links pointing into those pages on the amazon site.. so it would answer your second question.

    Since it is a spammy move for the PR6 site to do this, I wouldn't post the link here in the forum. I can PM you the url, if you want to persue this for research purposes.
     
    spdude, Apr 14, 2005 IP
    minstrel likes this.
  12. minstrel

    minstrel Illustrious Member

    Messages:
    15,082
    Likes Received:
    1,243
    Best Answers:
    0
    Trophy Points:
    480
    #12
    How do you know this?

    I would like to pursue this... thanks!
     
    minstrel, Apr 14, 2005 IP
  13. spdude

    spdude Guest

    Messages:
    1,315
    Likes Received:
    86
    Best Answers:
    0
    Trophy Points:
    0
    #13
    PM sent with a few URLs and some background info on those sites.
     
    spdude, Apr 14, 2005 IP
  14. Michael

    Michael Raider

    Messages:
    677
    Likes Received:
    92
    Best Answers:
    0
    Trophy Points:
    150
    #14
    It has been for a long time but like all search engine mythology it doesn't stop the blind from leading the blind.

    Take this search in Google for example "zouave zounds zulu zwischen zygote zymotic" (include the quotation marks).

    One hit, 355k Cached and Google is indexing the last 6 words.

    - Michael

     
    Michael, Apr 14, 2005 IP
  15. minstrel

    minstrel Illustrious Member

    Messages:
    15,082
    Likes Received:
    1,243
    Best Answers:
    0
    Trophy Points:
    480
    #15
    Thanks, spdude.
     
    minstrel, Apr 15, 2005 IP
  16. honey

    honey Prominent Member

    Messages:
    15,556
    Likes Received:
    712
    Best Answers:
    0
    Trophy Points:
    325
    #16
    Google's limit according to me right now is 513k, earlier it used to be 101k. Show me one page above 513k, and I am wrong. I will send you $20 via paypal if I am wrong :).
    Offer open for 7 days from today. So even if you can get a 1000k page indexed, go ahead. $20 waiting for you here.
     
    honey, Apr 15, 2005 IP
  17. dkalweit

    dkalweit Well-Known Member

    Messages:
    521
    Likes Received:
    35
    Best Answers:
    0
    Trophy Points:
    150
    #17
    The 600k+ page posted above shows 513k in Google, so I'd say you're probably right. 512/513k is a byte barrier where Google engineers would probably cut off, too... Nice info, Honey. :)


    --
    Derek
     
    dkalweit, Apr 15, 2005 IP
  18. Michael

    Michael Raider

    Messages:
    677
    Likes Received:
    92
    Best Answers:
    0
    Trophy Points:
    150
    #18
    Search Google for "ninny pandir zozoter zozote- habis! (oh)"

    One hit, 520k Cached and the last 6 words indexed.

    :)

    - Michael

     
    Michael, Apr 15, 2005 IP
  19. uca

    uca Well-Known Member

    Messages:
    2,242
    Likes Received:
    69
    Best Answers:
    0
    Trophy Points:
    155
    #19
    This thread appears to be very interesting.

    I knew that the 101k limit was in doubt recently, I can see it's no longer valid now!

    Back to the thread start, a sitemap should be for users too, so it should not be too large, instead it should be organized alphabetically or by subject.
    A directory in other words, a directory within a site and regarding the site's pages.
     
    uca, Apr 16, 2005 IP
  20. nevetS

    nevetS Evolving Dragon

    Messages:
    2,544
    Likes Received:
    211
    Best Answers:
    0
    Trophy Points:
    135
    #20
    Ideally, I'd like a script that spiders, sorts things into their appropriate categories, and grabs meta information for anchor text. It lays out everything in a configurable layout (i.e. #of links per page, how many pages), is search engine friendly, and can use some sort of html template.

    I have found several paid scripts that almost meet my needs, but nothing that is worth it so far.

    I have several spidering scripts that I can modify, but it's not a priority for me at the moment. For now, it's manual. I used sitemapper.pl in order to build this sitemap, but it ended up using a ton of memory with such a large site. (no writing to file until it is all done). I'm tempted to link this extraordinarily large sitemap now, but I'm worried about the page size being a problem - not just for spiders, but for end users as well. My pages are pretty well cross linked, so I think I'm going to wait until I at least have something reasonable before "making it live".

    I appreciate the input. I think 512K is a nice target to shoot for with such a large site. I think I'm going to go a little conservative with maybe 300K being the goal, with something in the way of a lighter option for my dial up visitors.

    300K isn't a lot of links considering when you add title text and an excerpt.
     
    nevetS, Apr 16, 2005 IP