Grr.. Need help with a preg

lggmaster Peon

Messages:: 233

Likes Received:: 1

Best Answers:: 0

Trophy Points:: 0

#1

maybe im completely wrong with how i should be doing it, but i have used pregmatch pregmatchall, etc.

im looking to take a list of text, strip out words like 'and', 'the', etc., meanwhile looping the left over words of the text to be placed into 3 different tables.

say for example a basic keyword density tool, if that can help understand what im looking for.

lggmaster, Dec 29, 2007 IP

Barti1987 Well-Known Member

Messages:: 2,703

Likes Received:: 115

Best Answers:: 0

Trophy Points:: 185

#2

Use str_replace to remove the words you want. Then just explode the remaining text and loop through it.

Peace,

Barti1987, Dec 30, 2007 IP

joebert Well-Known Member

Messages:: 2,150

Likes Received:: 88

Best Answers:: 0

Trophy Points:: 145

#3

<?php
	// Get article
	$text = file_get_contents('test.txt');

	// Replace non-word characters with whitespace. You can thank "w00t" for the \d in the pattern...
	$text = preg_replace('#[^a-z\d\s]+#i', ' ', $text);

	// Replace multiple concurrent whitespace with a single space
	$text = preg_replace('#\s{2,}#', ' ', $text);

	// Reserve a place for words
	$words = array();

	// Split article into words
	$text = explode(' ', trim($text));

	// Turn $words into an associative array with words as the keys & their counts as the values
	foreach($text as &$word)
	{
		// Make sure "The" and "the" are counted the same
		$word = strtolower($word);
		// If this word already has an entry, just increment its' counter, otherwise register the word
		isset($words[$word]) ? $words[$word]++ : ($words[$word] = 1);
	}
	// Don't need $text anymore
	unset($text);

	// Get the list of stopwords "the", "an", "and", etc. Each stopword is on its' own line.
	$stopwords = file('stopwords.txt');

	// Loop through the $stopwords, if there's an entry for a $stopword in $words, get rid of it that entry.
	foreach($stopwords as &$word)
	{
		// Trim the fat
		$word = trim($word);
		// Found & removed
		if(isset($words[$word]))
		{
			unset($words[$word]);
		}
	}
	
	// Don't need these anymore
	unset($stopwords);
	
	// Sort the array with highest count first,
	// use "arsort" so the word keys aren't replaced with numeric keys, which would defeat the entire purpose.
	arsort($words);
	echo '<pre>', print_r($words, true), '</pre>';
?>

PHP:

joebert, Dec 30, 2007 IP

Log in or Sign up

Grr.. Need help with a preg_match problem

lggmaster Peon

Barti1987 Well-Known Member

joebert Well-Known Member

Log in or Sign up

Grr.. Need help with a preg_match problem

lggmaster Peon

Barti1987 Well-Known Member

joebert Well-Known Member

Useful Searches