Advice on a complex MySQL script?

Discussion in 'MySQL' started by electroze, Sep 3, 2013.

  1. #1
    Hello-

    I'm trying to figure out a way to count and order by rank all words of an entire book I have stored in a MySQL table.

    The entire book is stored in 1 field and is 36,000 rows. Each row has a paragraph of text (a couple sentences). I'm trying to make a query that takes ALL words in the entire book and ranks them by word and count. So far, I only know how to compare and rank exact paragraphs, but wonder if there's a way to explode the words out of each sentence (while disregarding punctuation) and count them all? Maybe it needs regex and a php loop?

    SELECT COUNT( * ) AS
    ROWS , `words`
    FROM `book`
    GROUP BY `words`
    ORDER BY `Rows` DESC
    Code (markup):

    For example: 'Johnny was a good, good boy; that good-ole boy was raised well.'

    boy | 2
    good | 2
    was | 2
    good-ole | 1
    Johnny | 1
    etc. (but instead of one row, it totals the count for the entire book).

    Does anyone know if this is possible?
     
    electroze, Sep 3, 2013 IP
  2. sarahk

    sarahk iTamer Staff

    Messages:
    28,832
    Likes Received:
    4,541
    Best Answers:
    123
    Trophy Points:
    665
    #2
    Don't be afraid of huge database tables...

    I'd store the book in the text field that you currently have but I'd also "tag" the book - at the time it is first saved (ie inserted) I'd explode it to get all the individual words and then check to see if the word already exists in the word table and add it if it doesn't. Then I'd create a link record between the two.

    For example

    Book Table
    • id
    • title
    • text
    Word Table
    • id
    • word
    Link Table
    • id
    • book_id
    • word_id
    • count
    Then when you want to know how many times any word has been used you just query like this

    select sum(link.count)
    from word left join link on word.id = link.word_id
    where word.word = 'subject';

    and if you want to know which books have a particular word you query like this

    select book.id, book.title
    from book
    left join link on book.id = link.book_id
    left join word on link.word_id = word.id
    where word.word = 'subject';

    does that make sense?
     
    sarahk, Sep 3, 2013 IP