Improving site crawlability

Discussion in 'Search Engine Optimization' started by PaulJones, Dec 12, 2005.

  1. #1
    I'm working on a few sites right now and am doing Google Site Maps for them. I noticing that the site crawlers that I've turned loose on a couple of the sites (for the purposes of creating the XML file for Google) have choked a few times while going through the site. I'm wondering what can be done in general to improve site crawlability? I can't reveal the site names as they are for clients so I guess I'm just looking for any general suggestions. What can we do to make it easier on those poor spiders? :)

    (By the way, Matt Cutts posted the following in his blog a few days ago which is another reason crawlability has been on my mind. "Truthfully, much of the best SEO is common-sense: making sure that a site’s architecture is crawlable, coming up with useful content or services that has the words that people search for, and looking for smart marketing angles so that people find out about your site (without trying to take shortcuts)."
     
    PaulJones, Dec 12, 2005 IP
  2. Windows-Update-Advisor

    Windows-Update-Advisor Well-Known Member

    Messages:
    171
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    158
    #2
    Some factor may make your site more difficult to be crawl..

    1.SessionID
    2.Number of directory level
    3.robot.txt
    4.sites that require login..
     
    Windows-Update-Advisor, Dec 14, 2005 IP
  3. PaulJones

    PaulJones Peon

    Messages:
    59
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #3
    Thanks WUA.

    1. No session IDs while browsing the site.
    2. Not quite sure what you mean here. Do you mean how deep the site goes?
    3. Robots.txt. We have this to keep the spiders out of our admin and a few places in vBulletin. But that's a good thing right?
    4. We have a login system but don't require people to login to access any parts of the site.
     
    PaulJones, Dec 14, 2005 IP
  4. sarahk

    sarahk iTamer Staff

    Messages:
    28,810
    Likes Received:
    4,535
    Best Answers:
    123
    Trophy Points:
    665
    #4
    Have you used poodle predictor and Xenu to see if they give any feedback. Xenu can show just what the urls end up looking like. Sometimes I see things like mysite.com/info/../articles/something-interesting.html

    and while it's ok in the browser it's soooo dodgy, and the site owners don't even realise what they're doing.
     
    sarahk, Dec 14, 2005 IP
  5. frankm

    frankm Active Member

    Messages:
    915
    Likes Received:
    63
    Best Answers:
    0
    Trophy Points:
    83
    #5
    what I have seen is that sites require cookies to be used because their CMS requires it. set your Internet Explorer to not accept any cookie at all, try to navigate it again, see if you can get to all the pages.
     
    frankm, Dec 14, 2005 IP
  6. PaulJones

    PaulJones Peon

    Messages:
    59
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #6
    Good tips. I did notice that when I turned off cookies session IDs do get appended to the URLs (my guess is that has something to do with our e-commerce functionality). I was able to browse the site just fine but I'm wondering if that could be causing any problems?
     
    PaulJones, Dec 15, 2005 IP
  7. dave487

    dave487 Peon

    Messages:
    701
    Likes Received:
    20
    Best Answers:
    0
    Trophy Points:
    0
    #7
    dave487, Dec 15, 2005 IP
  8. apblake

    apblake Peon

    Messages:
    123
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    0
    #8
    This is kind of obvious, but if possible, keep all your pages no more than 3 levels deep (home page (1), category (2), category sub page (3). Making it easy to crawl for people also makes it easy for spiders.
     
    apblake, Dec 15, 2005 IP
  9. sarahk

    sarahk iTamer Staff

    Messages:
    28,810
    Likes Received:
    4,535
    Best Answers:
    123
    Trophy Points:
    665
    #9
    Yahoo remain behind the 8 ball. No feedback on how often they'll revisit the list (but I like that they do standard rss), no ability to ping, no account management so you can see if there were issues. Yahoo prove, again, why they are the poor relation to Google.
    And for smaller sites this should be totally acheivable. But remember you can "deeplink" into your site in any number of creative ways, and natural linking is frequently a deeplink. This means the search engines are told that those pages which are more than 3 levels deep are also really important.

    FYI: WordPress lets old posts go much more than 3 levels deep. The navigation is sound so it works despite that.
     
    sarahk, Dec 15, 2005 IP