I have an online bible in mysql and I'm trying to automatically scan all of the words in the entire database, count and sort them by most occurring (like the top 1000 or all the words sorted), and keep this info in mysql or export to Excel csv. I'm sure it will be resource-intensive, but the book won't change, so it can be a one-time process. MySQL/PHP output Example: Top 1000 keywords word / count ------------------ the - 28,023 and - 24,000 God - 4,118 Moses - 784 Ark - 40 etc - appears x times Has anyone ever done this?
The MySQL database is setup like this with each verse on a new row. Book bookno chapter verse text ----------------------------------------- Genesis 1 1 1 In the beginning God created the heaven and the earth. Genesis 1 1 2 And the earth was without form, and void; and darkness [was] upon the face of the deep. And the Spirit of God moved upon the face of the waters. Genesis 1 1 3 And God said, Let there be light: and there was light. I really hope it's that simple to just use 'order by' in a MySQL query, but I don't know how that would produce such a ranking. I basically want a keyword cloud, except for no cloud, just a huge list ranking the occurance of each of the words.
No single query will give you the required answer. You have to read all the rows and build the table of word count. Your steps could be: 1) Read a row of text. Remove punctuations etc, and extract the words. You could perhaps use split function to do this. 2) Increment the counter corresponding to the words.
Hi electroze This article might help you: http://www.onlamp.com/pub/a/php/2002/10/24/simplesearchengine.html?page=1 Petey
you want this for every single different word in the entire text? i'm not sure how you would make the list of every word in the text.. something like. i don't know php.. so this will be pseudo variable word_list loop over all books loop over all bookno's loop over all chapters loop over all verses query name=text_q SELECT text FROM bible WHERE book = book_list[b_index] and bookno = bookno_list[bn_index] and chapter = chapter_list[c_index] and verse = verse[v_index] end query word_list = word_list UNION toList(text_q.text) end loop end loop end loop end loop Code (markup): now i'm not entirely sure if you have predefined functions to do a union, or to make delimited list of the output from the query.. so hopefully i'm not just wasting time here.. after you have this list of every word.. (which might take years to run) we will run yet another incredibly long query loop over all words in word_list query name word_count SELECT count(*) as thecount FROM bible WHERE text like '(word boundary)word_list[w_index](word boundary)' end query query INSERT INTO word_count_table (word, count) VALUES (word_list[w_index], word_count.thecount) end query end loop Code (markup): well that will probably take a lot of work to transfer into php.. but i hope it's helpful anyways.. good luck