maybe im completely wrong with how i should be doing it, but i have used pregmatch pregmatchall, etc. im looking to take a list of text, strip out words like 'and', 'the', etc., meanwhile looping the left over words of the text to be placed into 3 different tables. say for example a basic keyword density tool, if that can help understand what im looking for.
Use str_replace to remove the words you want. Then just explode the remaining text and loop through it. Peace,
<?php // Get article $text = file_get_contents('test.txt'); // Replace non-word characters with whitespace. You can thank "w00t" for the \d in the pattern... $text = preg_replace('#[^a-z\d\s]+#i', ' ', $text); // Replace multiple concurrent whitespace with a single space $text = preg_replace('#\s{2,}#', ' ', $text); // Reserve a place for words $words = array(); // Split article into words $text = explode(' ', trim($text)); // Turn $words into an associative array with words as the keys & their counts as the values foreach($text as &$word) { // Make sure "The" and "the" are counted the same $word = strtolower($word); // If this word already has an entry, just increment its' counter, otherwise register the word isset($words[$word]) ? $words[$word]++ : ($words[$word] = 1); } // Don't need $text anymore unset($text); // Get the list of stopwords "the", "an", "and", etc. Each stopword is on its' own line. $stopwords = file('stopwords.txt'); // Loop through the $stopwords, if there's an entry for a $stopword in $words, get rid of it that entry. foreach($stopwords as &$word) { // Trim the fat $word = trim($word); // Found & removed if(isset($words[$word])) { unset($words[$word]); } } // Don't need these anymore unset($stopwords); // Sort the array with highest count first, // use "arsort" so the word keys aren't replaced with numeric keys, which would defeat the entire purpose. arsort($words); echo '<pre>', print_r($words, true), '</pre>'; ?> PHP: