1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

Our members have made a total of 156905 posts, but sitemap only 4000 links?

Discussion in 'Google Sitemaps' started by ly2, Apr 29, 2006.

  1. #1
    Our members have made a total of 156905 posts
    We have 3753 registered members

    But my sitemap only came out to 4,000 urls, is that right? Should there be atleast one url for each topic? J/W
     
    ly2, Apr 29, 2006 IP
  2. soj

    soj Well-Known Member

    Messages:
    233
    Likes Received:
    7
    Best Answers:
    0
    Trophy Points:
    138
    #2
    actually no, because u can have 10 posts on the one page, or more depending on what you have it set to, but then again it should still be more than 4000 urls, is the software you are using only limited to 4000 urls???
     
    soj, Apr 29, 2006 IP
  3. ly2

    ly2 Notable Member

    Messages:
    4,093
    Likes Received:
    222
    Best Answers:
    0
    Trophy Points:
    205
    #3
    No, it's limited to 5,000 urls.
    I am using freesitemapgenerator.com

    When I try and use a software app to generate my sitemap, it gets really crazy with like 200k+ urls. So I always think something is going wrong and stop the software. Is that right? Should it be hundreds of thousands of URL's?
    I figured maybe google wouldnt really like such a large sitemap file..
     
    ly2, Apr 29, 2006 IP
  4. soj

    soj Well-Known Member

    Messages:
    233
    Likes Received:
    7
    Best Answers:
    0
    Trophy Points:
    138
    #4
    yeah im not really sure, i am yet to find a good sitemap program. i think i might have to make one, ill keep you posted if it eventuates.
     
    soj, Apr 30, 2006 IP
  5. tonyinabox

    tonyinabox Peon

    Messages:
    1,988
    Likes Received:
    42
    Best Answers:
    0
    Trophy Points:
    0
    #5
    phpSitemap is a good one.
     
    tonyinabox, Apr 30, 2006 IP
  6. ronmojohny

    ronmojohny Active Member

    Messages:
    729
    Likes Received:
    20
    Best Answers:
    0
    Trophy Points:
    68
    #6
    ronmojohny, Apr 30, 2006 IP
  7. ly2

    ly2 Notable Member

    Messages:
    4,093
    Likes Received:
    222
    Best Answers:
    0
    Trophy Points:
    205
    #7
    I am using "softplus Gsite Crawler"
    I am just gonna let it ride, as of now after almost 24 hours it's at 81,000 crawled and 108,000 waiting to be crawled. Note, the "wait to be crawled" keeps going up lol

    I dunno how many url's this thing is gonna have when done =x
     
    ly2, Apr 30, 2006 IP
  8. websitetools

    websitetools Well-Known Member

    Messages:
    1,513
    Likes Received:
    25
    Best Answers:
    4
    Trophy Points:
    170
    #8
    The problem might be memory usage...Even if you have loads of memory, 32bit apps are still limited to 2 gigabyte memory address space. (3GB if using a switch)... With ~200k pages... Troubles can quickly occur if the program also stores lots of other data (e.g. redirections, links to and from, titles, response codes etc. you name it).
     
    websitetools, May 16, 2006 IP
  9. MaxPowers

    MaxPowers Peon

    Messages:
    261
    Likes Received:
    5
    Best Answers:
    0
    Trophy Points:
    0
    #9
    sessionID's may be causing problems in the URLs. Unless a spider is built to handle cookies, your forums see it as a new visitor and add session ID's to the end of the URL (PHP will also do this unless you set it not to). This could lead to hundreds of the same page with a different ID each time.

    ini_set("use_only_cookies","1");

    Another issue may be a poorly coded website/forum. For spiders that rely on HTML parsing, open tags and poor code can really confuse them. The validator at http://validator.w3c.org ought to shed some light on this. While most sites unfortunately have errors, serious errors can hinder a spider or even mess up in different browsers.

    Of course, this could be the generators fault by not handling relative links correctly or being unable to find links in A tags that are 'out of sequence' from the norm. Some may look at a tag that puts a class="" before the href="" and not recognize it. Generally any sitemap tool 'works' to a point, but may be unable to handle quirks and not suitable for a particular website.

    One solution is to use a web-based sitemap service that offers you a 'web bug' which is a small image on your page. The image is actually a script on the services website that captures your URL and inserts it into the database of your websites pages, then returns an image. This way, when it's time to create your sitemaps, all of the obscure URLs are added to your sitemaps.

    AutoMapIt was built to handle cookies after problems I had with this early on. I do use an HTML parser so that I can report on various SEO factors and for other reasons, but I also offer the web bug images if you need to go that route. If you can't get Gsite to work, I'll be happy to help you out. I used to use Gsite myself until I wanted something more ;)
     
    MaxPowers, May 18, 2006 IP
  10. softplus

    softplus Peon

    Messages:
    79
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    0
    #10
    Most default forums are a killer for crawlers - they generate upwards from 10 URLs for the same content. This is not a problem with Sitemap generators / crawlers -- it's a problem with the way forums are set up. Every new URL is a new page.

    I have seen forums with 1000 postings with over 100k generated URLs!

    So what do you do? One possibility would be to trim the URLs with the sitemap tool and only submit the ones you like (my GSiteCrawler will let you do that). However, doing that is only a small band-aid on the real problem: the search engines will find all those URLs anyway and play poker to guess at which ones you prefer. The only clean solution is to go through your forum and make sure that it only generates one crawlable + indexable URL per content block.

    It'll take some time to get cleaned out, once you have it online like that, but it's well worth it in the end. Would you rather have 100 postings with the same content ranking at positions 100+ or 1 posting at position 10? ;-)
     
    softplus, May 23, 2006 IP