Datamining on a mysql database

Discussion in 'Programming' started by Yazari, Mar 31, 2010.

  1. #1
    Hello,

    I Begin with textmining.
    I have two database tables with thousands of data..

    a table for "skills" and a table for "skills categories"

    - every "skill" belongs to a skills categorie.
    - a "skill" is , physicaly, a varchar(200) field in the database, where there is some text describing the skill.

    Here are some skills extracted from the skills table:

    "PHP (good level), Java (intermediaite), C++"
    "PHP5"
    "project management and quality management"
    "begining Javascript"
    "water engineering"
    "dfsdf zerze rzer"
    "cibling customers"



    what i want to do is to extract knowledge from those fields, i mean extract only the real skill and ignore the rest of useless text.
    for the above example i want to get only an array with:

    "PHP"
    "Java"
    "C++"
    "PHP5"
    "project management"
    "quality management"
    "Javascript"
    "water engineering"
    "cibling customers"

    what should i do to extract the skills from tons of data please ?
    do you know specific algorithms to do this ? ex : k-means ... ?

    Thanks in advance.
     
    Yazari, Mar 31, 2010 IP
  2. NeoCambell

    NeoCambell Peon

    Messages:
    456
    Likes Received:
    6
    Best Answers:
    0
    Trophy Points:
    0
    #2
    You can directly use SQL LIKE statement for this.

    For example: If you want to get the list of people who entered C++ somewhere in the text,

    SELECT person_ID FROM table1 WHERE skill_field LIKE '%C++%'
     
    NeoCambell, Apr 2, 2010 IP
  3. Yazari

    Yazari Peon

    Messages:
    180
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #3
    Hello, thanks for you reply but this is not my objective..
    I want to build a new database from the old database..
    In the new database i want to have only skills well categorized.. the old database contains skills + useless text around ..

    Example (this is a skills field in the old database):
    "I have started C++ in 2005 and i'm know mastering it and STL librairies"

    the desired output should be : C++, STL librairies.
     
    Yazari, Apr 5, 2010 IP
  4. NeoCambell

    NeoCambell Peon

    Messages:
    456
    Likes Received:
    6
    Best Answers:
    0
    Trophy Points:
    0
    #4
    Okay. I think I understand what you need now.
    I'll add pseudo code.

    
    Define an array of known strings ={C++, STL, Java, C#, etc...}
    
    Execute "Select * from old_table"
    While (there are more records) {
    
         1. take string filed to variable str.
             for each element in defined array of known elements
                   a. Match element within str (you can use "strstr" command if you use C)
                         - if found, append element to result
             next
         
          2. Insert a new record to new table with user ID and result string
    }
    Code (markup):
    You can write a function using a programming language or SQL (if your DBMS supports functions).
     
    NeoCambell, Apr 5, 2010 IP