Match Against Type Functions in PHP

Discussion in 'PHP' started by tflight, Nov 21, 2005.

  1. #1
    Let's say I have a string in PHP that contains a paragraph of text. I also have an array of possible categories. I want to use something similar to MySQL's 'match against' functionality to figure out which category is the best match for the paragraph of text.

    Here is what might be the tough part though... Each category name can be more than one word, and that category (string) might not have any exact match in the paragraph text.

    $paragraph_text = 'Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Quisque ultricies arcu eget libero. Lorem arcu eget';
    
    $possible_categories = array('arcu eget', 'sit amet', 'ipsum sit', 'foo' 'bar');
    
    PHP:
    So given the above string and array, I'm looking for suggestions how to figure out that 'arcu eget' is probably the best match because there is more than one exact match; 'sit amet' might be the next best match because there is one exact match, 'ipsum sit' might be the third best match because each word is contained within the string, and the remaining categories wouldn't match at all.

    Right now the only way I can think to do this is to use some regular expressions to see if there is more than one exact match, then use another regular expression to see if there is one exact match, then break up the individual array elements into individual words to see if there are any matches to words... if so are all words matched or only some.... yada, yada, yada.

    currently only the $paragraph_text is stored in a database, the array of categories just exists in PHP. Does PHP have any matching that provides "scroring" like the MySQL match, against fuctions? Or should I stick the array of categories into MySQL and match the paragraph text against the categories?
     
    tflight, Nov 21, 2005 IP
  2. Michau

    Michau Well-Known Member

    Messages:
    188
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    108
    #2
    No, PHP does not have such scoring match function. You need to code it by yourself. You can use ereg_replace to replace your matched string with emplty string, and repeat that until no match - this will tell you the number of matches for that phrase. Looks like a tedious task.

    I think you rather need to move this processing to MySQL.
     
    Michau, Dec 5, 2005 IP
  3. tflight

    tflight Peon

    Messages:
    617
    Likes Received:
    38
    Best Answers:
    0
    Trophy Points:
    0
    #3
    Thanks, Michau. I started to move this over to MySQL, but then I remembered something.... ft_min_word_len=4 by default. Since many of my category names have critical two letter words I would need to make a global change to MySQL to have full text matches look for words having a minimum of two characters.

    I could make that change and it would work, but I have other scripts that perform full text matching. And they didn't work well when I tried them with ft_min_word_len set to two.

    So it looks like I'm going to do this the long PHP way. Oh well. Thanks for the replacing with an empty string trick. I think I'm just going to use substr_count, then explode the category names into words and do it again.
     
    tflight, Dec 5, 2005 IP