Count Popular Words [Tag Cloud-ish]

Discussion in 'Databases' started by timallard, Sep 28, 2009.

  1. #1
    Hello,

    I am trying to create some trending out of my mysql data.
    I have a table, lets call it "table" i also have a column, lets call it column.
    The column has many rows (thousands) lets call them rows.

    I want to find out which words are used the most. eg. "Morning", "Computers", "Tim".

    The only thing is, this data is already in my database, I don't want to keep track of the count like a search query would by inserting or updating a number,.. at least i don't think it would work out since the data already exists ina column.

    I want to search a specific column and find out the most popular words so i can print them out..



    any ideas?...thanks!
     
    timallard, Sep 28, 2009 IP
  2. timallard

    timallard Well-Known Member

    Messages:
    1,634
    Likes Received:
    62
    Best Answers:
    0
    Trophy Points:
    158
    #2
    any idea on this? thanks!
     
    timallard, Sep 29, 2009 IP
  3. plog

    plog Peon

    Messages:
    298
    Likes Received:
    11
    Best Answers:
    1
    Trophy Points:
    0
    #3
    I think the solution to this will be achieved programatically not via the database. Here's the general way I would solve this:

    ~set up 2 arrays: $master_words, $master_counts
    ~execute a simple query to extract all the data from the column you want to analyze
    ~loop through each row of query results and explode the column into an array of words $tmp_words
    ~loop through each $tmp_word element to see if that particular word exists
    ~~if it does -increment its corresponding $master_counts element
    ~~if not add it to $master_words and set its corresponding $master_counts element to 1
    ~when all query results have been cataloged loop through the $master_counts array to find out which element has the most
    ~use that index to retrieve the corresponding word from $master_words
     
    plog, Sep 29, 2009 IP
  4. ohteddy

    ohteddy Member

    Messages:
    128
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    28
    #4
    If I understand correctly you have a table that looks like:

    Tags
    --------
    apple
    orange
    peach
    orange
    orange

    And the expected output would look like:

    Count | Tag
    ---------+-------
    3 | orange
    1 | apple
    1 | peach

    The SQL would be (untested)

    SELECT COUNT(tag) as count, tag FROM tags GROUP BY tag ORDER BY count;

    If I've mis-understood the problem, perhaps you could give me an example.
     
    ohteddy, Sep 29, 2009 IP
  5. timallard

    timallard Well-Known Member

    Messages:
    1,634
    Likes Received:
    62
    Best Answers:
    0
    Trophy Points:
    158
    #5
    Thanks guys I will try some things out from these answers.

    - Ohteddy: curently the colomn I need to find similar words are sentences. e,g.

    1. This is the first sentence i want to compare
    2. how about we try another sentence?

    out of this - the word sentence would be marked as having a higher weight or count.

    hmmm
     
    timallard, Oct 1, 2009 IP
  6. ohteddy

    ohteddy Member

    Messages:
    128
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    28
    #6
    I see, this is quite a bit different from what I thought.

    The easiest solution I can think of is to build a histogram.
    I'm not sure how I would do it in SQL with out defining a
    custom function or using a temporary table. You will
    probably want to write it in a language you're comfortable
    with. Also, depending on the size of your data set, you
    will may want to cache the results to disk.

    If you're not familiar with histograms checkout: http://en.wikipedia.org/wiki/Histogram
    If you have any problems implementing this let me know and I'll see
    if I can help.
     
    Last edited: Oct 1, 2009
    ohteddy, Oct 1, 2009 IP