Google's Ability To Spider 'Difficult' URLS

Discussion in 'Google' started by T0PS3O, Oct 15, 2004.

  1. #1
    Hi all,

    Not sure whether it's down to the latest new Bot they have running around but it seems to me like there's no mountain to big to clime when it comes to URL spidering. Look at this one with multiple variables, most of them named with the 'definite no-no' letters "ID".

    [nourl]http://www.argos.co.uk/webapp/wcs/stores/servlet/ArgosBrowseCounts?storeId=10001&catalogId=2501&langId=-1&categoryId=16419[/nourl]

    Indexed perfectly fine and ranks really well for a particular, fairly competitive, KW.

    Any reason left for us database programmers to be careful with variables in URLs?
     
    T0PS3O, Oct 15, 2004 IP
  2. mopacfan

    mopacfan Peon

    Messages:
    3,273
    Likes Received:
    164
    Best Answers:
    0
    Trophy Points:
    0
    #2
    Just better safe than sorry I think. I have several variables in my urls but I make sure than none are "id". Since yours are not just "ID", you're probably fine.
     
    mopacfan, Oct 15, 2004 IP
  3. hans

    hans Well-Known Member

    Messages:
    2,923
    Likes Received:
    126
    Best Answers:
    1
    Trophy Points:
    173
    #3
    YES
    there are many reasons why to be careful in presenting URLs and pages
    facts - real facts
    look at your own access_log stats and look at the % of
    G - quiet sophisticated bot
    Y - bot failry OK
    MSN - VERY much still a "baby" bot and far from ideal and mature yet

    and then look at how many % these SE bring traffic into your site
    and look at all other bots / SEs combined

    most of all NON-Google bots still are extremely fragile in their crawling "strength"

    sufficient reasons why to keep on serving nice URLs and nicely presented ( validated ) pages in clean style
     
    hans, Oct 15, 2004 IP
  4. webcertain

    webcertain Guest

    Messages:
    49
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #4
    You're right mopacfan, no ID and no 2 kilometres urls.. Might not help but won't hurt ;)
     
    webcertain, Oct 15, 2004 IP
  5. T0PS3O

    T0PS3O Feel Good PLC

    Messages:
    13,219
    Likes Received:
    777
    Best Answers:
    0
    Trophy Points:
    0
    #5
    I'd wish Argos was my company :) but it isn't and that site isn't mine.

    Hans, you are right in saying that Google isn't the only one out there and of course "better safe than sorry" still counts as well.
     
    T0PS3O, Oct 18, 2004 IP
  6. daamsie

    daamsie Peon

    Messages:
    237
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #6
    Hey tops, I've never really found that the letters 'id' are a no-no in a variable. What I have found is that if the variable is called ID (so not forumid or some variant) it has a very hard time getting indexed. We've been using a lot of variables on our site like 'forumid' or 'threadid' for the last couple of years already without a problem.

    I have noticed that we have started to get pages indexed with three url variables though - something new in Google.

    Mind you, at the same time, we've just started flattening our important URLs; mainly to ensure we are prepared for the other SEs.
     
    daamsie, Oct 21, 2004 IP
  7. 2003m2003

    2003m2003 Well-Known Member

    Messages:
    863
    Likes Received:
    17
    Best Answers:
    0
    Trophy Points:
    138
    #7
    found here
    http://www.google.com/webmasters/guidelines.html
     
    2003m2003, Oct 22, 2004 IP
  8. daamsie

    daamsie Peon

    Messages:
    237
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #8
    daamsie, Oct 22, 2004 IP
  9. skanxalot

    skanxalot Peon

    Messages:
    111
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #9
    I know of a site that just launched about 3 weeks ago that has session ids in the URL, among other variables. The site is completely indexed, session IDs and all, in Google, Yahoo and MSN. Ofcourse I would never recommend launching a site like that, but I found it very interesting.

    -Tyson
     
    skanxalot, Oct 22, 2004 IP
  10. webcertain

    webcertain Guest

    Messages:
    49
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #10
    Well there it is, by trying to have twice the size of pages indexed, there will be some serious garbage in Google's index now, but as long as the proportion garbage / valid url is steady... (http://www.google.com/googleblog/)
     
    webcertain, Nov 11, 2004 IP
  11. TLDTrader.com

    TLDTrader.com Peon

    Messages:
    187
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #11
    TLDTrader.com, Nov 22, 2004 IP
  12. Mia

    Mia R.I.P. STEVE JOBS

    Messages:
    23,694
    Likes Received:
    1,167
    Best Answers:
    0
    Trophy Points:
    440
    #12
    Mia, Nov 22, 2004 IP
  13. digitalpoint

    digitalpoint Overlord of no one Staff

    Messages:
    38,334
    Likes Received:
    2,613
    Best Answers:
    462
    Trophy Points:
    710
    Digital Goods:
    29
    #13
    Mmm... I know of some phpbb boards that can't be spidered by Google because of sessions IDs. Best bet (if you must run phpBB... hehe) would be to apply a hack to dump the session IDs.
     
    digitalpoint, Nov 22, 2004 IP
  14. Mia

    Mia R.I.P. STEVE JOBS

    Messages:
    23,694
    Likes Received:
    1,167
    Best Answers:
    0
    Trophy Points:
    440
    #14
    Sounds good.. I think I will go that route.. I use phpBB cause it's free :)
     
    Mia, Nov 22, 2004 IP
  15. misohoni

    misohoni Notable Member

    Messages:
    1,717
    Likes Received:
    32
    Best Answers:
    0
    Trophy Points:
    200
    #15
    yep I hacked up my phpbb site good (is this even a proper sentence?). Anyway, I found when I removed Session ID's, users had difficulty logging in.

    I was a die-hard Phpbb addict, now I had to face reality and recommend VBullitin or Invasion instead...
     
    misohoni, Nov 22, 2004 IP
  16. TLDTrader.com

    TLDTrader.com Peon

    Messages:
    187
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #16
    TLDTrader.com, Nov 22, 2004 IP
  17. misohoni

    misohoni Notable Member

    Messages:
    1,717
    Likes Received:
    32
    Best Answers:
    0
    Trophy Points:
    200
    #17
    that's the mod I used. It effects log in's by users and cookies. Not recommended, sorry!
     
    misohoni, Nov 23, 2004 IP
  18. Tazzam

    Tazzam Peon

    Messages:
    1
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #18
    Newbie here everyone, how is everyone ('s rank?)

    If this is so then I hate Google even more! I just rebuilt one of our entire sites (5000) pages with html catalog to combat this problem.

    Once again Googlefied.
     
    Tazzam, Nov 23, 2004 IP
  19. darksat

    darksat Guest

    Messages:
    1,239
    Likes Received:
    16
    Best Answers:
    0
    Trophy Points:
    0
    #19
    The PHPBB hack to remove session ids is piss easy to use, its great.

    as for variables in the url, for every variable the PR of the targeted phrase generally drops 1 automatically,

    (yes I know that there are exeptions if you deep link externally to the page but generally if you get all the PR off your frontpage or site pages its true.)
     
    darksat, Nov 24, 2004 IP