Hello, I am trying to create some trending out of my mysql data. I have a table, lets call it "table" i also have a column, lets call it column. The column has many rows (thousands) lets call them rows. I want to find out which words are used the most. eg. "Morning", "Computers", "Tim". The only thing is, this data is already in my database, I don't want to keep track of the count like a search query would by inserting or updating a number,.. at least i don't think it would work out since the data already exists ina column. I want to search a specific column and find out the most popular words so i can print them out.. any ideas?...thanks!
I think the solution to this will be achieved programatically not via the database. Here's the general way I would solve this: ~set up 2 arrays: $master_words, $master_counts ~execute a simple query to extract all the data from the column you want to analyze ~loop through each row of query results and explode the column into an array of words $tmp_words ~loop through each $tmp_word element to see if that particular word exists ~~if it does -increment its corresponding $master_counts element ~~if not add it to $master_words and set its corresponding $master_counts element to 1 ~when all query results have been cataloged loop through the $master_counts array to find out which element has the most ~use that index to retrieve the corresponding word from $master_words
If I understand correctly you have a table that looks like: Tags -------- apple orange peach orange orange And the expected output would look like: Count | Tag ---------+------- 3 | orange 1 | apple 1 | peach The SQL would be (untested) SELECT COUNT(tag) as count, tag FROM tags GROUP BY tag ORDER BY count; If I've mis-understood the problem, perhaps you could give me an example.
Thanks guys I will try some things out from these answers. - Ohteddy: curently the colomn I need to find similar words are sentences. e,g. 1. This is the first sentence i want to compare 2. how about we try another sentence? out of this - the word sentence would be marked as having a higher weight or count. hmmm
I see, this is quite a bit different from what I thought. The easiest solution I can think of is to build a histogram. I'm not sure how I would do it in SQL with out defining a custom function or using a temporary table. You will probably want to write it in a language you're comfortable with. Also, depending on the size of your data set, you will may want to cache the results to disk. If you're not familiar with histograms checkout: http://en.wikipedia.org/wiki/Histogram If you have any problems implementing this let me know and I'll see if I can help.