1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

Google changed its algorithm by adding one more factor -- file name, folder name

Discussion in 'Google' started by hans, Mar 29, 2004.

  1. #1
    Google changed its algorithm by adding one more factor -- file name / folder name / path (URL)

    ( today - may be days earlier and i may have missed it ) google started to integrate strings found IN URL (this was seen @ Yahoo weeks earlier )
    making file names and folder names important as a relevancy factor

    proof:

    http://www.google.com/search?num=10...e=off&client=googlet&q=kriya+yoga&btnG=Search

    shows domain name and folder/file names highlighted - bold

    a careful study of all results down to 100 or so also shows the importance of having LINKS on link pages or directories relevant to topic/keyword as
    in most cases ( few exceptions seen ) the full keyword-relevant part of the path/folder/file name of link page is highlighted-bold
    which most likely can be taken as proof of its use in determinating the result-rank on page displayed.

    while Yahoo uses search terms found in directory-names ---> highlighting of directory- and subdirectory-names but at present NO strings in actual URL

    proof:

    http://search.yahoo.com/search?p=kriya+yoga&ei=UTF-8&fr=fp-tab-web-t&n=100&fl=0&x=wrt

    which shows the importance of being in keyword-relevant directory @ Yahoo

    ( thought having seen string behavior at Yahoo some weeks ago - same as the one NEW @ Google now )


    Essence:

    it will pay off to have relevant folder-/file names instead of neutral file names / folder names
    it also may make it useful to clearly restrict ONE page to ONE main-topic/keyword group that is contained in path- and file-name
    for best SEO.

    advantage of new behavior:

    quality sites and topic relevant directories are preferred over link farms and simple link exchange pages or irrelevant sites just happening to have the keyword in their text.
    for all smaller companies who really care - hingh ranking may be easier as it becomes mainly a function of quality in content and web design and details in site structure and presentation.
    while the latter factors are under easy direct and instant control of every small business - large hierarchy oriented/managed businesses might be too inflexible to fully adapt to needs by SE.
     
    hans, Mar 29, 2004 IP
  2. digitalpoint

    digitalpoint Overlord of no one Staff

    Messages:
    38,333
    Likes Received:
    2,613
    Best Answers:
    462
    Trophy Points:
    710
    Digital Goods:
    29
    #2
    As far as I know, keywords within the URL have been a factor for some time now. It's just now that Google has decided to bold keywords within the URL.

    - Shawn
     
    digitalpoint, Mar 29, 2004 IP
  3. hulkster

    hulkster Peon

    Messages:
    1,705
    Likes Received:
    93
    Best Answers:
    0
    Trophy Points:
    0
    #3
    I wonder if the folder/file name has always (or at least for a while) been used in the calculation of SERP's ... but this new "boldness" just wasn't displayed beforehand. Seems like an obvious thing to look for use as a factor - I've been doing this for a while, so it if it really IS a change in terms of determining SERP's, I'm all for it! ;-)

    alek
     
    hulkster, Mar 29, 2004 IP
  4. hans

    hans Well-Known Member

    Messages:
    2,923
    Likes Received:
    126
    Best Answers:
    1
    Trophy Points:
    173
    #4
    may be .. :)
    because the results/ranks appear to be the same as weeks ago ..

    but even when now highlighted - there still is a difference on HOW the highlighted string is found. NOT every string has same value for rank creating. see the tow samples

    http://www.google.com/search?num=10...e=off&client=googlet&q=kriya+yoga&btnG=Search
    vs
    http://www.google.com/search?num=10...fe=off&client=googlet&q=kriyayoga&btnG=Search

    hence domain name selection or folder/filename may be one significant factor - having a domain name separated by - or _ when it consists of a combination of words
    same applies to folder_names file_names
     
    hans, Mar 29, 2004 IP
  5. compar

    compar Peon

    Messages:
    2,705
    Likes Received:
    169
    Best Answers:
    0
    Trophy Points:
    0
    #5
    That's a very interesting finding. I always thought that you needed to put "-" between the words for Google to be able to recognize the individual word within the URL. But your kriya yoga search show them recognizing each word as part of a string search.

    The other interesting thing to me is this keyword recognition is case sensitive at this level. Those urls using either Kriya or Yoga didn't have the capitalized words high lighted. However when I did a Search using the capitalized form "Kriya Yoga" it made no difference in the results. Google still only highlighted the non capitalized words.

    So the message would seem to be -- do not use capital letters in URL, file or folder names.
     
    compar, Mar 29, 2004 IP
  6. Mr T

    Mr T Guest

    Messages:
    62
    Likes Received:
    6
    Best Answers:
    0
    Trophy Points:
    0
    #6
    I agree with Shawn, I thought Google had been doing this for ages. If a page has Kriya Joga in the path, theres a good chance it will be about Kriya Yoga. But yeah, not sure how long they have been extracting keywords from longer words..
     
    Mr T, Mar 29, 2004 IP
  7. hans

    hans Well-Known Member

    Messages:
    2,923
    Likes Received:
    126
    Best Answers:
    1
    Trophy Points:
    173
    #7
    yes there are many interesting things to look at
    and more importantly to apply when creating new folder structures and file names
    if necessary in a field wtih HIGH competition such as recreation of every days items in life
    it may even be worth to fully restructure/rename all site in order to convert into a future
    conform style of data presentation.

    with smart learning PCs and spell check built in ( partially ) in Google ( and others as well one day )
    it becomes more and more important to present data in a human-thinking PC format for highest efficiency.

    another point i consider since many months is making pages google-wap-proxy conform and smart phones conform ( since i had myself a nokia 3650 surfing regular web content directly AND via google.

    i DO have wap visitors - from Google

    very usful in regard to string in single words vs separate words it is helpful to see the search-query behaviors of surfers with the "Keyword Suggestion Tool"
    and see how many surfers use THEIR personal syntax

    many use a personal syntax different from webster .. hence it may even be sometimes a realistic target group to consciously target ONE kind of spelling if there are sufficient surfers - OR to have 2 identical pages - optimized for each individual spelling provided the alternate spelling has a large user group.

    just imagine the present days word crucifixion - and its commonly used public styles crucification and crucifiction ...
    or snow mobile vs snowmobile - both have users .. latter the majority

    and many other word and key sentences of daily use and life ..

    NORMAL spelling for my domain would be separate words - 1997 there was NO SEO site around and no one told me i could separate the words somehow - i was the first in my field zero competition.

    NOW people should select their domain name very carefully and know what they really want in llife so they can built up all on that basis - the domain name.
    same appears to selection of directories at DMOZ / yahoo and others .. or when creating own link pages -
    when doing so in a folder relevant to your own topic of site and including other sites relevant to your own topic AND only quality sites with PR 4 or higher or new sites with HIGH standards

    then you are becoming "member" of a group of relevant topic sites enhancing each others listings

    i think in Google that plays already some role NOW
     
    hans, Mar 29, 2004 IP
  8. yonnermark

    yonnermark Peon

    Messages:
    137
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #8
    Re: extracting keywords from longer words:
    the google toolbar has always been able to extra words from within longer words. So maybe the algo is still blind to words-within-other-words but when it comes to displaying the serps, the page behaves like the toolbar "find on page" tool
     
    yonnermark, Mar 30, 2004 IP
  9. dkalweit

    dkalweit Well-Known Member

    Messages:
    520
    Likes Received:
    35
    Best Answers:
    0
    Trophy Points:
    150
    #9
    Back in late 2000, I re-designed http://www.nesfiles.com/ with this in mind. Every game page has an URL such as: http://www.nesfiles.com/NES/Super_Mario_Bros__Duck_Hunt/Super_Mario_Bros__Duck_Hunt.asp. This got my main keyword("NES") in there twice(URL and path), and then got the name of the game in twice. All the game pages are really just stubs that do an SSI of 'game.asp' with the appropriate iID in the database... I used VB Scripts to automate the creation of the directory structure and stub files from the database-- I was surprised at how little time it took, and the HUGE payoff ranking-wise I got...

    I recently tried with http://www.sensiblesoftware.com/, and made a directory "network_monitoring_software", but that didn't seem to help my rankings much... :(


    --
    Derek
     
    dkalweit, Mar 30, 2004 IP
  10. compar

    compar Peon

    Messages:
    2,705
    Likes Received:
    169
    Best Answers:
    0
    Trophy Points:
    0
    #10
    This is a very interesting thread, because I was always of the opinion that the only way Google would see individual words in a domain name, or the full URL, was if you separated the words with a "-".

    It was always my understanding that "_" or continuous runtogetherwords would not be seen for their individual parts.

    Dkalweit expierence seems to refute this theory, but I still wonder if is only doing a string search for highlighting purposes and not really parsing these phrases for relevance ranking.

    One of the reasons I ask is that they are not highlighting the capitalized versions of the words in the URLs. This makes me suspicious, because Google doesn't differentiate between capitalized and non capitalized words as search terms. So if they are parsing and using all the words that they highlight as part of the ranking within the SERP, why wouldn't they highlight all forms of the word, or words, used in the search?
     
    compar, Mar 30, 2004 IP
  11. dkalweit

    dkalweit Well-Known Member

    Messages:
    520
    Likes Received:
    35
    Best Answers:
    0
    Trophy Points:
    150
    #11
    I kinda wonder if Google uses the same algorithm for highlighting as it does searching-- I really tend to doubt it. I did a quick test on searching for a game or two, and google doesn't seem to be treating "_" as a word seperator-- which honestly, I can't understand-- an underscore has been used as a word seperator in MANY circles for a LONG time... Who knows-- could have just been my added content that drove extra traffic. Maybe I should use a quick VBScript to change the "_" seperators to "-". :)


    --
    Derek
     
    dkalweit, Mar 30, 2004 IP
  12. digitalpoint

    digitalpoint Overlord of no one Staff

    Messages:
    38,333
    Likes Received:
    2,613
    Best Answers:
    462
    Trophy Points:
    710
    Digital Goods:
    29
    #12
    I did some testing about six months ago that seemed to suggest Google was able to break apart words at that time. I created a page within digitalpoint.com with one inbound link (with the anchor text of "test"). Once that page was spidered, I was able to get it to come up in a search for "digital" and/or "point" (those words were not on the page itself).

    - Shawn
     
    digitalpoint, Mar 30, 2004 IP
  13. compar

    compar Peon

    Messages:
    2,705
    Likes Received:
    169
    Best Answers:
    0
    Trophy Points:
    0
    #13
    Wow! Thanks to Hans I think we are doing some ground breaking work in this discussion.

    I have never seen this discussed or acknowledged on any other forum or discussion group. The only thing that has ever been acknowledged is that Google would see the the individual words in hypenated domains.

    However, many SEO purists, Jill Whalen prime among them, claimed that the practice was spammy and that if they were Google they would give any page using them extra scrutiny, because only scummy webmasters would use them.

    These same people claimed that one of the factors in the Florida up date was the demotion of sites with hypenated domain names. I did a lot of testing at the time and I didn't ever think that was true.

    If what your test indicates Shawn is correct, then there is no need to register hypenated domains for any reason other than to get a word combination that has already been registered in a runtogetherform.
     
    compar, Mar 30, 2004 IP
  14. expat

    expat Stranger from a far land

    Messages:
    873
    Likes Received:
    18
    Best Answers:
    0
    Trophy Points:
    0
    #14
    Stemming may well help in the process to disect long words.
    I think keywords in domains, structure and file where wheiging a bit more at the time when click here was a valid anchor text and it may still be used for those.

    It does help on the ADsense ad's where the display url need not be the click url and having the search terms highlighted there is a nice feature and helps pulling punters in and gives effectively more characters to play with.

    So I think it's just a nice feature with the marginal effect that it reads closer to the actual search a user has performed.
    M
     
    expat, Mar 30, 2004 IP
  15. compar

    compar Peon

    Messages:
    2,705
    Likes Received:
    169
    Best Answers:
    0
    Trophy Points:
    0
    #15
    I cannot see how stemming is involved. In fact I would say exactly the opposite from the result Hans has shown.

    Stemming involves reading one form of a word and deciding that the site is also relevant for another forms of the word. The simplest example being the use of a singular word in the title or page content and Google ranking the site also for a search on the plural version of the word.

    In fact I have a site where I have been optimizing like hell for a particular singular term and to my chagrin Google ranks the site higher for the plural form of the term. That is thanks to stemming.

    But if stemming was involved in Hans results we would have seen variations of the search phrase highlighted. And in fact we did not.
     
    compar, Mar 30, 2004 IP
  16. digitalpoint

    digitalpoint Overlord of no one Staff

    Messages:
    38,333
    Likes Received:
    2,613
    Best Answers:
    462
    Trophy Points:
    710
    Digital Goods:
    29
    #16
    Truthfully, I thought that was common knowledge... maybe I just forgot to tell people. :)

    - Shawn
     
    digitalpoint, Mar 30, 2004 IP
  17. Such Great Heights

    Such Great Heights Peon

    Messages:
    715
    Likes Received:
    19
    Best Answers:
    0
    Trophy Points:
    0
    #17
    Such Great Heights, Mar 30, 2004 IP
  18. Foxy

    Foxy Chief Natural Foodie

    Messages:
    1,614
    Likes Received:
    48
    Best Answers:
    0
    Trophy Points:
    0
    #18
    Likewise - I have always used "-" but this morning I just happened to look at McDars tool for a allinurl phrase and noticed just what is being talked about here!

    If it is true, then I'm a happy bunny as all my directories/folders and indeed the url on each level has been carrying the "theme" for over a year now albeit mostly in hyphenated form, but I have some that, like Shawns test, are runtogethers just like this one

    www.ski-jungle.com/ links/rentalskiingaccommodationfrance.html

    the search phrase was "ski apartments france" and the highlites are

    1. the first ski
    2. the ski in skiing
    3 france at the end

    :)
     
    Foxy, Mar 31, 2004 IP
  19. rhastman

    rhastman Peon

    Messages:
    10
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #19
    Generally speaking, are underscores or dashes preferable for folder and file names?
     
    rhastman, Apr 12, 2004 IP
  20. mcdar

    mcdar Peon

    Messages:
    1,831
    Likes Received:
    110
    Best Answers:
    0
    Trophy Points:
    0
    #20
    I just ran a allinurl: for a main competitor of mine. Their url is w*w.sleepingbagsandtents.com

    The results for allinurl:sleeping bags and tents yielded ONLY 16 results!
    - my competitor was NOT among them.

    This is not to say that the main results use the same search technique but in the allinurl: search, Google was not able to pick out the individual words.
     
    mcdar, Apr 12, 2004 IP