Automatically sort all words in MySQL and rank them

Discussion in 'MySQL' started by electroze, Nov 17, 2007.

  1. #1
    I have an online bible in mysql and I'm trying to automatically scan all of the words in the entire database, count and sort them by most occurring (like the top 1000 or all the words sorted), and keep this info in mysql or export to Excel csv. I'm sure it will be resource-intensive, but the book won't change, so it can be a one-time process.

    MySQL/PHP output Example:

    Top 1000 keywords
    word / count
    ------------------
    the - 28,023
    and - 24,000
    God - 4,118
    Moses - 784
    Ark - 40
    etc - appears x times

    Has anyone ever done this?
     
    electroze, Nov 17, 2007 IP
  2. RaginBajin

    RaginBajin Peon

    Messages:
    87
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    0
    #2
    use Order by <column names>
     
    RaginBajin, Nov 17, 2007 IP
  3. electroze

    electroze Active Member

    Messages:
    179
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    60
    #3
    The MySQL database is setup like this with each verse on a new row.

    Book bookno chapter verse text
    -----------------------------------------
    Genesis 1 1 1 In the beginning God created the heaven and the earth.

    Genesis 1 1 2 And the earth was without form, and void; and darkness [was] upon the face of the deep. And the Spirit of God moved upon the face of the waters.

    Genesis 1 1 3 And God said, Let there be light: and there was light.


    I really hope it's that simple to just use 'order by' in a MySQL query, but I don't know how that would produce such a ranking. I basically want a keyword cloud, except for no cloud, just a huge list ranking the occurance of each of the words.
     
    electroze, Nov 17, 2007 IP
  4. electroze

    electroze Active Member

    Messages:
    179
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    60
    #4
    Any other ideas?
     
    electroze, Nov 18, 2007 IP
  5. Kuldeep1952

    Kuldeep1952 Active Member

    Messages:
    290
    Likes Received:
    18
    Best Answers:
    0
    Trophy Points:
    60
    #5
    No single query will give you the required answer.

    You have to read all the rows and build the table of word count.

    Your steps could be:

    1) Read a row of text. Remove punctuations etc, and extract the words.
    You could perhaps use split function to do this.

    2) Increment the counter corresponding to the words.
     
    Kuldeep1952, Nov 19, 2007 IP
  6. Petey

    Petey Peon

    Messages:
    68
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    0
    #6
    Hi electroze

    This article might help you:

    http://www.onlamp.com/pub/a/php/2002/10/24/simplesearchengine.html?page=1

    Petey
     
    Petey, Nov 20, 2007 IP
  7. Jamie18

    Jamie18 Peon

    Messages:
    201
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    0
    #7
    you want this for every single different word in the entire text?

    i'm not sure how you would make the list of every word in the text.. something like.

    i don't know php.. so this will be pseudo
    
    variable word_list
    
    loop over all books
      loop over all bookno's
        loop over all chapters
          loop over all verses
            query name=text_q
              SELECT text
              FROM bible
              WHERE  book = book_list[b_index]
                         and bookno = bookno_list[bn_index]
                         and chapter = chapter_list[c_index]
                         and verse = verse[v_index]
            end query
            word_list = word_list UNION toList(text_q.text)
          end loop
        end loop
      end loop
    end loop
    Code (markup):
    now i'm not entirely sure if you have predefined functions to do a union, or to make delimited list of the output from the query.. so hopefully i'm not just wasting time here..

    after you have this list of every word.. (which might take years to run)

    we will run yet another incredibly long query

    
    loop over all words in word_list
      query name word_count
        SELECT count(*) as thecount
        FROM bible
        WHERE text like '(word boundary)word_list[w_index](word boundary)'
      end query
      query
        INSERT
        INTO word_count_table (word, count)
        VALUES (word_list[w_index], word_count.thecount)
      end query
    end loop
    
    Code (markup):
    well that will probably take a lot of work to transfer into php.. but i hope it's helpful anyways.. good luck
     
    Jamie18, Nov 20, 2007 IP