Pulling Data from a Text File in Php Script?

Discussion in 'PHP' started by patrich, Oct 4, 2009.

  1. #1
    I have a little php script that pulls serp data and displays it in my website. It searches MSN for a keyword that I enter and then gets me "x" listings. I would like the script to use a random keyword from a .txt file to use for the search query?

    I have tried file_get_contents and several other functions without any luck. I also tried to use a simple random quote php script to no avail. Any ideas?

    $keyword = $paramsA{'my_keyword'};
    $keyword = eregi_replace("\.s*html*$","",$keyword);
    $keyword = eregi_replace("\-|\_|\.|%20| ","+",$keyword);
    $keyword_phrase_array = preg_split('/\+/',$keyword);
    
    $limit = $paramsA{'limit'};if (!$limit) { $limit = 0; }
    PHP:
    I would like to replace 'my_keyword' in the snippet above with a random keyword/search string but can't seem to get it to work any help would be appreciated.

    Thanks
     
    patrich, Oct 4, 2009 IP
  2. goliath

    goliath Active Member

    Messages:
    308
    Likes Received:
    11
    Best Answers:
    0
    Trophy Points:
    60
    #2
    a) load the text file into memory, since you need to select one item you want an array:

    file_get_contents () will do this for you.

    b) Determine the number of elements in the array:

    count () will do this for you.

    c) select a random element ID from the number of available elements in the array:

    rand () will do this for you.

    d) insert the randomly seleted entry into your query just like you would any other string

    There's no need for you to use fopen or any of that to achieve this. What problem did you ahve with file_get_contents before?
     
    goliath, Oct 4, 2009 IP
  3. patrich

    patrich Peon

    Messages:
    142
    Likes Received:
    9
    Best Answers:
    0
    Trophy Points:
    0
    #3
    file_get_contents was breaking the script and the page wouldn't load, though I may not have been using it correctly either and I didn't save the code snippet I was using :(

    I have included the complete script below. Basically I want to randomly search several keywords rather than just the one 'My_Keyword'. These could be taken from a text file in the same directory or even coded within this script itself, I just have been unable to find a solution that works? Ideas?

    <?php
    # get the parameters
    $qs = getenv("QUERY_STRING");
    $qsA = explode("&",$qs);
    $thi = current($qsA);
    while ($thi){
        $tempA = explode("=",$thi);
        $paramsA{$tempA[0]} = $tempA[1];
        $thi =  next($qsA);
    }
    # $keyword is the keyword phrase used to perform the search
    $keyword = $paramsA{'my_keyword'};
    $keyword = eregi_replace("\.s*html*$","",$keyword);
    $keyword = eregi_replace("\-|\_|\.|%20| ","+",$keyword);
    $keyword_phrase_array = preg_split('/\+/',$keyword);
    
    # $limit is the max number of items to echo
    $limit = $paramsA{'limit'};if (!$limit) { $limit = 0; }
    
    # $link is the switch used to link the items to their URLs
    $link = $paramsA{'link'};if (!$link) { $link = 0; }
    
    # $randomize is the switch used to shuffle the results
    $randomize = $paramsA{'randomize'};if ($randomize == ""){ $randomize = 0;}
    
    # $format is the switch used to format for neat output.
    # Otherwise, the output is just a solid block.
    $format = $paramsA{'format'};if ($format == ""){ $format = 1;}
    
    # $cache is the switch used to turn off/on cacheing.  Default is 1 (on).
    $cache = $paramsA{'cache'};if ($cache == ""){ $cache = 1;}
    
    # $cachelimit is the number of seconds to cache the RSS feeds.  Default is one day.
    $cachelimit = $paramsA{'cachelimit'};if ($cachelimit == ""){ $cachelimit = 86400;}
    
    # $kwd is the target keyword density, in percentage points
    $kwd = $paramsA{'kwd'}; if ($kwd == '') { $kwd = 0; }
    
    # disable this line for testing
    error_reporting(E_ALL ^ E_NOTICE);
    ########################
    # You can edit these variables
    #
    # Pull feed from MSN's SERPs
    $feed = 'http://search.msn.com/results.aspx?q='.$keyword.'&format=rss&FORM=R0RE';
    #
    # or pull feed from Yahoo's SERPs
    # $feed = 'http://api.search.yahoo.com/WebSearchService/rss/webSearch.xml?appid=yahoosearchwebrss&query='.$keyword.'&adult_ok=1';
    #
    # Name the cache directory
    $cacheDir = 'MyCache';
    $defaultKeyword = 'Search Engine Marketing';
    
    if (!$keyword){
        die ('Error: no keyword');
    }
    
    $cacheDir = './'.$cacheDir;
    // create lastRSS object
    $rss = new lastRSS; 
    
    if ($cache){
        // setup transparent cache
        $rss->cache_dir = $cacheDir; 
        $rss->cache_time = $cachelimit;
    }
    $rss->items_limit = $limit;
    
    $textArray = array();
    $allWords = array();
    // grab the RSS file
    if ($rs = $rss->get($feed)) {
        $count = 0;
        $totalKWD = 0;
        $totalWordCount = 0;
        $totalKWDCount = 0;
        if ((count($rs['items'])<1) && ($defaultKeyword)){
            $keyword = preg_replace('/\+/',' ',$keyword);
            $feed = preg_replace("/$keyword/",$defaultKeyword,$feed);
            $feed = preg_replace('/ /','+',$feed);
            $rs = $rss->get($feed);
        } elseif ((count($rs['items'])<1) && (!$defaultKeyword)){
            die ('No results');
        }
        foreach($rs['items'] as $item) {
            $words = array();
            $item['description'] = eregi_replace("\<\!\[CDATA\[","",$item['description']);
            $item['description'] = eregi_replace("\]\]\>","",$item['description']);
            $item['description'] = unhtmlentities($item['description']);
            $item['title'] = eregi_replace("\<\!\[CDATA\[","",$item['title']);
            $item['title'] = eregi_replace("\]\]\>","",$item['title']);
            $item['title'] = unhtmlentities($item['title']);
            if ($format) { $textArray[$count] = "<p>";}
            if ($link){
                $textArray[$count] .= "<a href=\"".$item['link']."\">";
            }
            $textArray[$count] .= $item['title'];
            if ($link){
                $textArray[$count] .= "</a>";
            }
            if ($format) { $textArray[$count] .= "<br />";}
            $textArray[$count] .= $item['description'];
            if ($format) { $textArray[$count] .= "</p>\n"; }
            if ($limit != 0){
                $limc = $count + 1;
                if ($limc <= $limit){
                    $textArray[$count] = utf8_decode($textArray[$count]);
                } else {
                    $textArray[$count] = "";
                }
            } else {
                $textArray[$count] = utf8_decode($textArray[$count]);
            }
            
            if ($kwd){
                # calculate the keyword density for this item and store it in $itemKWD
                $tempWord = $textArray[$count];
                $tempWord = strtolower($tempWord);
                $itemKWordCount = 0;
                $preg_remove = "/<[a-zA-Z\/][^>]*>/";                       # regex to remove html tags
                $tempWord = preg_replace($preg_remove,' ',$tempWord);  # replace html tags with spaces
    
                $pregWord = '/\w+/';                                   # regex to identify words
                preg_match_all($pregWord,$tempWord,$wordsx,PREG_PATTERN_ORDER);  # put words in array $words
                $words = $wordsx[0];
                $wordCount = array_count_values($words);               # create associative array indexing frequency of each word
                
                reset($keyword_phrase_array);                          # prepare the array holding the keywords
                $currentWord = current($keyword_phrase_array);
                $itemKWordCount = 0;
                while ($currentWord){                                                # iterate through each keyword and
                    $itemKWordCount = $itemKWordCount + $wordCount[$currentWord];    # add its frequency to $itemKWordCount
                    $totalKWDCount = $totalKWDCount + $wordCount[$currentWord];
                    $currentWord = next($keyword_phrase_array);
                }
                $itemKWD = ($itemKWordCount / count($keyword_phrase_array)) / count($words);  # divide $itemKWordCount by number of keywords
                                                                                              # to get an average count, then divide by number
                                                                                              # of words to get keyword density for this item.
            
                                                                                              
                # echo "Item $count KWD = $itemKWD<br>\n";
                # echo "Item $count words=".count($words).", keywords=$itemKWordCount<br>\n";
                $totalKWD = $totalKWD + $itemKWD;                      # add this items keyword density to total keyword density aggregate
                $totalWordCount = $totalWordCount + count($words);
                
                # echo "running keyword density $totalKWD<br><br>";
            }
            $count++;
        } 
    
        if (($kwd) && ($count)){
            $finalKWD = $totalKWD / $count;                 # divide total keyword density aggregate by number of items to get
                                                                       # final keyword density, stored in variable $finalKWD
            $finalKWDagg = ($totalKWDCount / count($keyword_phrase_array)) / $totalWordCount;
            
            $necessaryKWDcount = ($kwd/100) * $totalWordCount * count($keyword_phrase_array);
            $KWDtoInsert = round(($necessaryKWDcount - $totalKWDCount),0);
            
            if ($KWDtoInsert>0){
                srand($totalWordCount); # set the random number seed equal to the number of words in the document
                for ($a=0;$a<$KWDtoInsert;$a++){
                    $kwtouse = rand(0,count($keyword_phrase_array)-1);  # randomly choose a keyword part to insert
                    $itemToInsert = rand(0,count($textArray)-1);        # randomly choose an Item to work with
                    
                    $tempWord = $textArray[$itemToInsert];
                    $tempWord = strip_tags($tempWord);
                    $preg_remove = '/<[a-zA-Z\/][^>]*>/';                       # regex to remove html tags
                    $tempWord = preg_replace($preg_remove,' ',$tempWord);  # replace html tags with spaces
                    $pregWord = '/\w+/';                                   # regex to identify words
                    preg_match_all($pregWord,$tempWord,$wordqz,PREG_PATTERN_ORDER);  # put words in array $words
                    $wordq = $wordqz[0];
                    $selection = rand(0,count($wordq)-1);               # randomly select a word to replace
                    $ok = 0;
                    $match = 0;
                    $tries = 1;
                    while (!$ok){
                        for ($d=0;$d<count($keyword_phrase_array);$d++){
                            if (preg_match("/$keyword_phrase_array[$d]/i",$wordq[$selection])){
                                $match = 1;
                            } 
                        }
                        if ($match){
                            $selection = rand(0,count($wordq)-1);
                        } elseif (preg_match('/\&.+;/',$wordq[$selection])){
                            $selection = rand(0,count($wordq)-1);
                        } else {
                            $ok = 1;
                        }
                        if ($tries > count($wordq)){
                            # none of the words are suitable for replacement
                            break;
                        }
                        $tries++;
                    }
                    if ($tries > count($wordq)){
                        $a--;
                        continue;
                    }
                    
                    #echo "replace '$wordq[$selection]' ($selection) with '$keyword_phrase_array[$kwtouse]' in item $itemToInsert<br>\n";
                    # break the $textArray[$itemToInsert] into two sections to avoid replacing a link by accident
                    if (preg_match("/<a [^>]*>/",$textArray[$itemToInsert])){ # if there is a link
                        $po = strpos($textArray[$itemToInsert],'">');
                        $firstpart = substr($textArray[$itemToInsert], 0, $po + 2);
                        #echo $firstpart."<br>\n";
                        $secondpart = substr($textArray[$itemToInsert], $po + 2, strlen($textArray[$itemToInsert]));
                        $secondpart = preg_replace("/(\W)[^<\/>]$wordq[$selection](\W)/i", "\\1$keyword_phrase_array[$kwtouse]\\2", $secondpart, 1);
                        $textArray[$itemToInsert] = $firstpart . $secondpart;
                    } else {
                        $textArray[$itemToInsert] = preg_replace("/(\W)$wordq[$selection](\W)/i", "\\1$keyword_phrase_array[$kwtouse]\\2", $textArray[$itemToInsert], 1);
                    }
                                    
                }
            } elseif ($KWDtoInsert<0){
                srand($totalWordCount); # set the random number seed equal to the number of words in the document
                for ($a=0;$a>$KWDtoInsert;$a--){
                    $kwtouse = rand(0,count($keyword_phrase_array)-1);  # randomly choose a keyword part to remove
                    $itemToInsert = rand(0,count($textArray)-1);        # randomly choose an Item to work with
                    if (!stristr($textArray[$itemToInsert],$keyword_phrase_array[$kwtouse])){ # if keyword part not found in this item, choose another
                        $a++;
                        continue;
                    }
                    $tempWord = $textArray[$itemToInsert];
                    $preg_remove = "/<[a-zA-Z\/][^>]*>/";                      # regex to remove html tags
                    $tempWord = preg_replace($preg_remove,' ',$tempWord);  # replace html tags with spaces
                    $pregWord = '/\w+/';                                   # regex to identify words
                    preg_match_all($pregWord,$tempWord,$wordqz,PREG_PATTERN_ORDER);  # put words in array $words
                    $wordq = $wordqz[0];
                    $repwordpointer = rand(0,count($wordq));               # randomly select a word to use as a replacement
                    $repword = $wordq[$repwordpointer];
                    $ok = 0;
                    while (!$ok){
                        for ($e=0;$e<count($keyword_phrase_array);$e++){
                            if (!preg_match("/$keyword_phrase_array[$e]/i",$repword)){ # word is not a keyword part, it is suitable
                                $ok = 1;
                            } else {
                                $repwordpointer = rand(0,count($wordq));   # word is a keyword part and is unsuitable. pick another one
                                $repword = $wordq[$repwordpointer];
                            }
                        }
                    }
                    
                    # find out how many times this keyword part is used
                    preg_match_all("/$keyword_phrase_array[$kwtouse]/i",$textArray[$itemToInsert],$instx,PREG_PATTERN_ORDER);
                    $inst = $instx[0];                                  # contains number of times keyword part is used
                    $pointer = rand(1,count($inst)) - 1;                # randomly choose which keyword to delete
                    if ($pointer == 0){
                        $match = "/ *$keyword_phrase_array[$kwtouse] */i"; # regex for first keyword
                        $replacement = " $repword ";
                    } else {
                                                                        # regex for other keywords
                        $match = "/(.*$keyword_phrase_array[$kwtouse].*){$pointer} *$keyword_phrase_array[$kwtouse] */i";
                                                                        # $pointer in this case tells the script how many other keyword
                                                                        # parts to overlook before doing the replacement.
                        $replacement = "\\1 $repword ";                          # the \\1 tells the script to insert the text that 
                                                                        # was before the replaced keyword part
                    }
                                                                        # perform replacement
                    $textArray[$itemToInsert] = preg_replace($match,$replacement,$textArray[$itemToInsert],1);
                    #echo "Item: $itemToInsert, Keyword: $keyword_phrase_array[$kwtouse]<br>\n";
                }
            }
        }                                                              
         
    
        if ($count){
            if ($randomize){
                shuffle($textArray);
            }
            array_walk($textArray, sendOut);
            #echo "<p>Keyword density per item (averaged) = $finalKWD<br />";
            #echo "Keyword density aggregate = $finalKWDagg<br />";
            #echo "totalKWDCount: $totalKWDCount<br>totalWordCount: $totalWordCount<br />";
            #echo "Necessary keyword part count = $necessaryKWDcount<br />";
            #echo "We need to insert $KWDtoInsert keyword parts.<br />";
        } else {
            die ('Error: '.$feed.' not responding with RSS file');
        }
    
    
    
    } else {
        die ('Error: RSS file at '.$feed.' not found...');
    }
    
    function sendOut($text){
        echo $text;
    }
    
    
    class lastRSS { 
        // ------------------------------------------------------------------- 
        // Public properties 
        // ------------------------------------------------------------------- 
        var $default_cp = 'UTF-8'; 
        var $CDATA = 'nochange'; 
        var $cp = ''; 
        var $items_limit = 0; 
        var $stripHTML = False; 
        var $date_format = ''; 
    
        // ------------------------------------------------------------------- 
        // Private variables 
        // ------------------------------------------------------------------- 
        var $channeltags = array ('title', 'link', 'description', 'language', 'copyright', 'managingEditor', 'webMaster', 'lastBuildDate', 'rating', 'docs'); 
        var $itemtags = array('title', 'link', 'description', 'author', 'category', 'comments', 'enclosure', 'guid', 'pubDate', 'source'); 
        var $imagetags = array('title', 'url', 'link', 'width', 'height'); 
        var $textinputtags = array('title', 'description', 'name', 'link'); 
    
        // ------------------------------------------------------------------- 
        // Parse RSS file and returns associative array. 
        // ------------------------------------------------------------------- 
        function Get ($rss_url) { 
            // If CACHE ENABLED 
            if ($this->cache_dir != '') { 
                $cache_file = $this->cache_dir . '/rsscache_' . md5($rss_url); 
                $timedif = @(time() - filemtime($cache_file)); 
                if ($timedif < $this->cache_time) { 
                    // cached file is fresh enough, return cached array 
                    $result = unserialize(join('', file($cache_file))); 
                    // set 'cached' to 1 only if cached file is correct 
                    if ($result) $result['cached'] = 1; 
                } else { 
                    // cached file is too old, create new 
                    $result = $this->Parse($rss_url); 
                    $serialized = serialize($result); 
                    if ($f = @fopen($cache_file, 'w')) { 
                        fwrite ($f, $serialized, strlen($serialized)); 
                        fclose($f); 
                    } 
                    if ($result) $result['cached'] = 0; 
                } 
            } 
            // If CACHE DISABLED >> load and parse the file directly 
            else { 
                $result = $this->Parse($rss_url); 
                if ($result) $result['cached'] = 0; 
            } 
            // return result 
            return $result; 
        } 
         
        // ------------------------------------------------------------------- 
        // Modification of preg_match(); return trimed field with index 1 
        // from 'classic' preg_match() array output 
        // ------------------------------------------------------------------- 
        function my_preg_match ($pattern, $subject) { 
            // start regullar expression 
            preg_match($pattern, $subject, $out); 
    
            // if there is some result... process it and return it 
            if(isset($out[1])) { 
                // Process CDATA (if present) 
                if ($this->CDATA == 'content') { // Get CDATA content (without CDATA tag) 
                    $out[1] = strtr($out[1], array('<![CDATA['=>'', ']]>'=>'')); 
                } elseif ($this->CDATA == 'strip') { // Strip CDATA 
                    $out[1] = strtr($out[1], array('<![CDATA['=>'', ']]>'=>'')); 
                } 
    
                // If code page is set convert character encoding to required 
                if ($this->cp != '') 
                    //$out[1] = $this->MyConvertEncoding($this->rsscp, $this->cp, $out[1]); 
                    $out[1] = iconv($this->rsscp, $this->cp.'//TRANSLIT', $out[1]); 
                // Return result 
                return trim($out[1]); 
            } else { 
            // if there is NO result, return empty string 
                return ''; 
            } 
        } 
    
        // ------------------------------------------------------------------- 
        // Replace HTML entities &something; by real characters 
        // ------------------------------------------------------------------- 
        function unhtmlentities ($string) { 
            // Get HTML entities table 
            $trans_tbl = get_html_translation_table (HTML_ENTITIES, ENT_QUOTES); 
            // Flip keys<==>values 
            $trans_tbl = array_flip ($trans_tbl); 
            // Add support for &apos; entity (missing in HTML_ENTITIES) 
            $trans_tbl += array('&apos;' => "'"); 
            // Replace entities by values 
            return strtr ($string, $trans_tbl); 
        } 
    
        // ------------------------------------------------------------------- 
        // Parse() is private method used by Get() to load and parse RSS file. 
        // Don't use Parse() in your scripts - use Get($rss_file) instead. 
        // ------------------------------------------------------------------- 
        function Parse ($rss_url) { 
            // Open and load RSS file 
            if ($f = @fopen($rss_url, 'r')) { 
                $rss_content = ''; 
                while (!feof($f)) { 
                    $rss_content .= fgets($f, 4096); 
                } 
                fclose($f); 
    
                // Parse document encoding 
                $result['encoding'] = $this->my_preg_match("'encoding=[\'\"](.*?)[\'\"]'si", $rss_content); 
                // if document codepage is specified, use it 
                if ($result['encoding'] != '') 
                    { $this->rsscp = $result['encoding']; } // This is used in my_preg_match() 
                // otherwise use the default codepage 
                else 
                    { $this->rsscp = $this->default_cp; } // This is used in my_preg_match() 
    
                // Parse CHANNEL info 
                preg_match("'<channel.*?>(.*?)</channel>'si", $rss_content, $out_channel); 
                foreach($this->channeltags as $channeltag) 
                { 
                    $temp = $this->my_preg_match("'<$channeltag.*?>(.*?)</$channeltag>'si", $out_channel[1]); 
                    if ($temp != '') $result[$channeltag] = $temp; // Set only if not empty 
                } 
                // If date_format is specified and lastBuildDate is valid 
                if ($this->date_format != '' && ($timestamp = strtotime($result['lastBuildDate'])) !==-1) { 
                            // convert lastBuildDate to specified date format 
                            $result['lastBuildDate'] = date($this->date_format, $timestamp); 
                } 
    
                // Parse TEXTINPUT info 
                preg_match("'<textinput(|[^>]*[^/])>(.*?)</textinput>'si", $rss_content, $out_textinfo); 
                    // This a little strange regexp means: 
                    // Look for tag <textinput> with or without any attributes, but skip truncated version <textinput /> (it's not beggining tag) 
                if (isset($out_textinfo[2])) { 
                    foreach($this->textinputtags as $textinputtag) { 
                        $temp = $this->my_preg_match("'<$textinputtag.*?>(.*?)</$textinputtag>'si", $out_textinfo[2]); 
                        if ($temp != '') $result['textinput_'.$textinputtag] = $temp; // Set only if not empty 
                    } 
                } 
                // Parse IMAGE info 
                preg_match("'<image.*?>(.*?)</image>'si", $rss_content, $out_imageinfo); 
                if (isset($out_imageinfo[1])) { 
                    foreach($this->imagetags as $imagetag) { 
                        $temp = $this->my_preg_match("'<$imagetag.*?>(.*?)</$imagetag>'si", $out_imageinfo[1]); 
                        if ($temp != '') $result['image_'.$imagetag] = $temp; // Set only if not empty 
                    } 
                } 
                // Parse ITEMS 
                preg_match_all("'<item(| .*?)>(.*?)</item>'si", $rss_content, $items); 
                $rss_items = $items[2]; 
                $i = 0; 
                $result['items'] = array(); // create array even if there are no items 
                foreach($rss_items as $rss_item) { 
                    // If number of items is lower then limit: Parse one item 
                    if ($i < $this->items_limit || $this->items_limit == 0) { 
                        foreach($this->itemtags as $itemtag) { 
                            $temp = $this->my_preg_match("'<$itemtag.*?>(.*?)</$itemtag>'si", $rss_item); 
                            if ($temp != '') $result['items'][$i][$itemtag] = $temp; // Set only if not empty 
                        } 
                        // Strip HTML tags and other bullshit from DESCRIPTION 
                        if ($this->stripHTML && $result['items'][$i]['description']) 
                            $result['items'][$i]['description'] = strip_tags($this->unhtmlentities(strip_tags($result['items'][$i]['description']))); 
                        // Strip HTML tags and other bullshit from TITLE 
                        if ($this->stripHTML && $result['items'][$i]['title']) 
                            $result['items'][$i]['title'] = strip_tags($this->unhtmlentities(strip_tags($result['items'][$i]['title']))); 
                        // If date_format is specified and pubDate is valid 
                        if ($this->date_format != '' && ($timestamp = strtotime($result['items'][$i]['pubDate'])) !==-1) { 
                            // convert pubDate to specified date format 
                            $result['items'][$i]['pubDate'] = date($this->date_format, $timestamp); 
                        } 
                        // Item counter 
                        $i++; 
                    } 
                } 
    
                $result['items_count'] = $i; 
                return $result; 
            } 
            else // Error in opening return False 
            { 
                return False; 
            } 
        } 
    } 
    
    function unhtmlentities ($string) { 
        // Get HTML entities table 
        $trans_tbl = get_html_translation_table (HTML_ENTITIES, ENT_QUOTES); 
        // Flip keys<==>values 
        $trans_tbl = array_flip ($trans_tbl); 
        // Add support for &apos; entity (missing in HTML_ENTITIES) 
        $trans_tbl += array('&apos;' => "'"); 
        // Replace entities by values 
        return strtr ($string, $trans_tbl); 
    } 
    
    ?>
    PHP:
     
    patrich, Oct 4, 2009 IP
  4. goliath

    goliath Active Member

    Messages:
    308
    Likes Received:
    11
    Best Answers:
    0
    Trophy Points:
    60
    #4
    Well it's exactly what you need (file_get_contents) if your list of strings is in a separate text file. Seeing all of the code just confirms that.

    I posted step-by-step instructions, you are saying you need the code?
     
    goliath, Oct 4, 2009 IP