Hello, I Begin with textmining. I have two database tables with thousands of data.. a table for "skills" and a table for "skills categories" - every "skill" belongs to a skills categorie. - a "skill" is , physicaly, a varchar(200) field in the database, where there is some text describing the skill. Here are some skills extracted from the skills table: "PHP (good level), Java (intermediaite), C++" "PHP5" "project management and quality management" "begining Javascript" "water engineering" "dfsdf zerze rzer" "cibling customers" what i want to do is to extract knowledge from those fields, i mean extract only the real skill and ignore the rest of useless text. for the above example i want to get only an array with: "PHP" "Java" "C++" "PHP5" "project management" "quality management" "Javascript" "water engineering" "cibling customers" what should i do to extract the skills from tons of data please ? do you know specific algorithms to do this ? ex : k-means ... ? Thanks in advance.
You can directly use SQL LIKE statement for this. For example: If you want to get the list of people who entered C++ somewhere in the text, SELECT person_ID FROM table1 WHERE skill_field LIKE '%C++%'
Hello, thanks for you reply but this is not my objective.. I want to build a new database from the old database.. In the new database i want to have only skills well categorized.. the old database contains skills + useless text around .. Example (this is a skills field in the old database): "I have started C++ in 2005 and i'm know mastering it and STL librairies" the desired output should be : C++, STL librairies.
Okay. I think I understand what you need now. I'll add pseudo code. Define an array of known strings ={C++, STL, Java, C#, etc...} Execute "Select * from old_table" While (there are more records) { 1. take string filed to variable str. for each element in defined array of known elements a. Match element within str (you can use "strstr" command if you use C) - if found, append element to result next 2. Insert a new record to new table with user ID and result string } Code (markup): You can write a function using a programming language or SQL (if your DBMS supports functions).