Help, How can I get each word from a page with CURL?

Discussion in 'PHP' started by CLMMafia, Jul 8, 2012.

  1. #1
    Hello,

    I'm trying to make a page that will scan a page, and get each word from the page and put it in an array.

    So far I have this:
    <?php
    require_once $_SERVER['DOCUMENT_ROOT'].'/includes/config.inc.php';
    
        $sql = mysql_query("SELECT * FROM md5_spider WHERE scaned = '' LIMIT 1");
        $num = @mysql_num_rows($sql);
        while($row = mysql_fetch_array($sql)){
            $ch = curl_init();
            curl_setopt($ch,CURLOPT_URL, $row['url']);
            curl_setopt($ch,CURLOPT_RETURNTRANSFER, 1);
            curl_setopt($ch,CURLOPT_CONNECTTIMEOUT, 5);
            $data = curl_exec($ch);
            curl_close($ch);
            
            @mysql_query("UPDATE md5_spider SET data = '{$data}' WHERE id = '{$row['id']}' AND url = '{$row['url']}'") or die(mysql_error());
    
            $data1 = explode(' ', $data);
    
            foreach($data1 as $word){
                $num = @mysql_num_rows(mysql_query("SELECT * FROM wordlist WHERE word = '{$word}'"));
                if(($num == 0) || ($num == '0')){
                    echo $word.'<br/>';
                    $md5 = md5($word);
                    @mysql_query("INSERT INTO wordlist (word, md5) VALUES ('{$word}', '{$md5}')") or die(mysql_error());
                }
            }
            
            @mysql_query("UPDATE md5_spider SET scaned = '".time()."' WHERE id = '{$row['id']}' AND url = '{$row['url']}'") or die(mysql_error());
        }
    
    PHP:
    The purpose of the file is, to scan pages and add words to a database.

    The only thing is, I'm having trouble getting the words from the pages.
    I was using a bunch of str_replace's to remove useless things, but I was still getting error's.

    Thanks,
    CLM
     
    CLMMafia, Jul 8, 2012 IP
  2. EricBruggema

    EricBruggema Well-Known Member

    Messages:
    1,740
    Likes Received:
    28
    Best Answers:
    13
    Trophy Points:
    175
    #2
    Check this page, its handy to know.
    www.php.net/manual/en/function.curl-exec.php

    and a function on that page;

    
    function curl_get($url, array $get = NULL, array $options = array()) 
    {    
        $defaults = array( 
            CURLOPT_URL => $url. (strpos($url, '?') === FALSE ? '?' : ''). http_build_query($get), 
            CURLOPT_HEADER => 0, 
            CURLOPT_RETURNTRANSFER => TRUE, 
            CURLOPT_TIMEOUT => 4 
        ); 
        
        $ch = curl_init(); 
        curl_setopt_array($ch, ($options + $defaults)); 
        if( ! $result = curl_exec($ch)) 
        { 
            trigger_error(curl_error($ch)); 
        } 
        curl_close($ch); 
        return $result; 
    }
    
    PHP:
     
    EricBruggema, Jul 11, 2012 IP