Improving site crawlability

PaulJones Peon

Messages:: 59

Likes Received:: 1

Best Answers:: 0

Trophy Points:: 0

#1

I'm working on a few sites right now and am doing Google Site Maps for them. I noticing that the site crawlers that I've turned loose on a couple of the sites (for the purposes of creating the XML file for Google) have choked a few times while going through the site. I'm wondering what can be done in general to improve site crawlability? I can't reveal the site names as they are for clients so I guess I'm just looking for any general suggestions. What can we do to make it easier on those poor spiders?

(By the way, Matt Cutts posted the following in his blog a few days ago which is another reason crawlability has been on my mind. "Truthfully, much of the best SEO is common-sense: making sure that a siteâ€™s architecture is crawlable, coming up with useful content or services that has the words that people search for, and looking for smart marketing angles so that people find out about your site (without trying to take shortcuts)."

PaulJones, Dec 12, 2005 IP

Windows-Update-Advisor Well-Known Member

Messages:: 171

Likes Received:: 4

Best Answers:: 0

Trophy Points:: 158

#2

Some factor may make your site more difficult to be crawl..

1.SessionID
2.Number of directory level
3.robot.txt
4.sites that require login..

Windows-Update-Advisor, Dec 14, 2005 IP

PaulJones Peon

Messages:: 59

Likes Received:: 1

Best Answers:: 0

Trophy Points:: 0

#3

Thanks WUA.

1. No session IDs while browsing the site.
2. Not quite sure what you mean here. Do you mean how deep the site goes?
3. Robots.txt. We have this to keep the spiders out of our admin and a few places in vBulletin. But that's a good thing right?
4. We have a login system but don't require people to login to access any parts of the site.

PaulJones, Dec 14, 2005 IP

sarahk iTamer Staff

Messages:: 28,810

Likes Received:: 4,535

Best Answers:: 123

Trophy Points:: 665

#4

Have you used poodle predictor and Xenu to see if they give any feedback. Xenu can show just what the urls end up looking like. Sometimes I see things like mysite.com/info/../articles/something-interesting.html

and while it's ok in the browser it's soooo dodgy, and the site owners don't even realise what they're doing.

sarahk, Dec 14, 2005 IP

frankm Active Member

Messages:: 915

Likes Received:: 63

Best Answers:: 0

Trophy Points:: 83

#5

what I have seen is that sites require cookies to be used because their CMS requires it. set your Internet Explorer to not accept any cookie at all, try to navigate it again, see if you can get to all the pages.

frankm, Dec 14, 2005 IP

PaulJones Peon

Messages:: 59

Likes Received:: 1

Best Answers:: 0

Trophy Points:: 0

#6

Good tips. I did notice that when I turned off cookies session IDs do get appended to the URLs (my guess is that has something to do with our e-commerce functionality). I was able to browse the site just fine but I'm wondering if that could be causing any problems?

PaulJones, Dec 15, 2005 IP

dave487 Peon

Messages:: 701

Likes Received:: 20

Best Answers:: 0

Trophy Points:: 0

#7

Also do a yahoo site map.
http://submit.search.yahoo.com/free/request

dave487, Dec 15, 2005 IP

apblake Peon

Messages:: 123

Likes Received:: 3

Best Answers:: 0

Trophy Points:: 0

#8

This is kind of obvious, but if possible, keep all your pages no more than 3 levels deep (home page (1), category (2), category sub page (3). Making it easy to crawl for people also makes it easy for spiders.

apblake, Dec 15, 2005 IP

sarahk iTamer Staff

Messages:: 28,810

Likes Received:: 4,535

Best Answers:: 123

Trophy Points:: 665

#9

dave487 said:

Also do a yahoo site map.
http://submit.search.yahoo.com/free/request
Click to expand...

Yahoo remain behind the 8 ball. No feedback on how often they'll revisit the list (but I like that they do standard rss), no ability to ping, no account management so you can see if there were issues. Yahoo prove, again, why they are the poor relation to Google.

apblake said:

This is kind of obvious, but if possible, keep all your pages no more than 3 levels deep (home page (1), category (2), category sub page (3). Making it easy to crawl for people also makes it easy for spiders.
Click to expand...

And for smaller sites this should be totally acheivable. But remember you can "deeplink" into your site in any number of creative ways, and natural linking is frequently a deeplink. This means the search engines are told that those pages which are more than 3 levels deep are also really important.

FYI: WordPress lets old posts go much more than 3 levels deep. The navigation is sound so it works despite that.

sarahk, Dec 15, 2005 IP

Log in or Sign up

Improving site crawlability

PaulJones Peon

Windows-Update-Advisor Well-Known Member

PaulJones Peon

sarahk iTamer Staff

frankm Active Member

PaulJones Peon

dave487 Peon

apblake Peon

sarahk iTamer Staff

Useful Searches