Problems listing very large site on Google,please read

Discussion in 'Google Sitemaps' started by webbedfeet, Feb 16, 2009.

  1. #1
    Hello,

    in Autumn 2008 we released a series of classified web sites for our client, the main site is

    http://www.ThatsExactlyWhatWeWant.co.uk

    The idea is that there are hundreds of smaller sites (currently around 600) each with their own sub

    domain. For example, http://Peugeot206.ThatsExactlyWhatWeWant.co.uk

    Each of these sites should be considered as independent by Google as they have separate sub

    domains, a separate Google Sitemap etc. This seems to be the case. The idea is that each site may

    target and benefit a small section of people, and as the content is specific for each site, should

    eventually rank quite well in search engines. i.e. a Peugeot 206 web site should rate well for such

    searches, and when visited, should deliver only relevant listings to the user.

    There are around 40 million pages that can be listed (according to our sitemap), yet nowhere near

    that many listed. We were typing "site: thatsexactlywhatwewant.co.uk" into Google, and watching it

    rise, but several weeks ago it froze at 501,000 pages. However, if we type a similar query for each

    sub site, and total the results, we get over 1.1 million pages listed. This seems to be more accurate.
    This seems to be growing at around 22,000 pages per day, with some sites falling, and some rising. At
    this rate it will take a year or two to reach only 10 million pages. This is too long.

    We're getting, on average, 1 unique visitor per 1,000 pages, so 10 million would get us 10,000 visitors per day. That's the plan anyway.

    So, we need to find a way of speeding up pages. I have logged into Google's control panel and can adjust
    the crawl rate, and will gradually do this.

    However, as each site is considered separate, and there are 600 of them, and so many pages, the Google Sitemap files are large and, more importantly, the queries to produce them take up a lot of processing power. This really slows the web site down, so much it crashes. Google seems to check all the time (URLS such as http://Peugeot206.ThatsExactlyWhatWeWant.co.uk/sitemap.php) So I set up a cache, so that the sitemaps are just sent out by the web server each time there is a request, this really reduces database load. However, even with a 14 day cache we have issues. I could set to 28 days, or even 56 days ... this will make it better, but it's still a lot of processing power.

    Bear in mind that the pages on the site themselves are very navigatable by search engines, find one page and you can click to any other in a few clicks. Especially when navigating from the home page. As for sold classifieds, which we are keeping, these are all indexed through an archive page.

    QUESTION 1

    So my question, are there any adverse effects in caching the sitemap files for 2 months? Google can use the old files to locate all sold items, and the web site itself for all daily updates. Infact, let’s look at it another way ... are sitemaps necessary? If I could get rid of these sitemaps and still index pages, it would be a great relief. The vast majority of MySQL processing time is from Google, and sitemaps are the bulk of this.

    QUESTION 2

    Apart from adjusting the crawl rate is there any other way of speeding up indexing? Can we tell Google, for example, to spend less time revisiting pages and more time looking for new ones?

    QUESTION 3

    Any other idea how to get these sites listed quickly?


    Any help appreciated - thanks for reading
     
    webbedfeet, Feb 16, 2009 IP
  2. bleuken

    bleuken Peon

    Messages:
    86
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    0
    #2
    Sitemaps are used to read the site easily, a relief for the spider so that he will not use his "worst case scenario" algorithm where he needs to find all the pages of the site in the slowest crawling algo it can ever used to crawl them.

    For speeding up indexing and revisit, i think there's a meta tag use for that but not really 'respected' by the spiders so in other words, just wait for G to crawl you. Sub-domain does not inherit the 'page importance' of the main domain making it not really crawled immediately.
     
    bleuken, Feb 16, 2009 IP
  3. webbedfeet

    webbedfeet Peon

    Messages:
    3
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #3
    Thanks for the information and the quick response.
     
    webbedfeet, Feb 16, 2009 IP
  4. martmart

    martmart Peon

    Messages:
    14
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #4
    Crawl rate: you can adjust the minimum time between requests, but you cannot enforce the Google spiders to come more often. I think Google states that the adjustment's meaning is, how often the robot can visit your site in a certain time slot. Make sure, that the bot will not find difficulties as, particularly, slow server responses.

    The size of the time slot itself (and the number of page requests) depends on other factors, that should be majorly qualified backlinks and domain trust. The bot will be more eager to look at your site(s), if it is expecting valuable content.

    To improve your load problems I'd suggest to create static sitemaps. Do them once a day or even just once in a week. Btw, think of the sitemap as road map. The bot will eventually drive along all its streets - but it will also examine the streets for further unmapped streets. And drive along these as well.
     
    martmart, Feb 16, 2009 IP