scraping guide?

Discussion in 'PHP' started by dnahosting, Feb 20, 2007.

  1. #1
    I have been looking for a guide to website scraping, I have some of the basics down, but I have been having trouble creating the start and stop points, like the stop point stops at the last </div> and I want it to stop at the next </div> after the start <div> if that makes any sense.
     
    dnahosting, Feb 20, 2007 IP
  2. ErectADirectory

    ErectADirectory Guest

    Messages:
    656
    Likes Received:
    65
    Best Answers:
    0
    Trophy Points:
    0
    #2
    Below is a simple function to scrape of alexa ranking found here. I hope this points you in the right direction as it seems pretty simple to implement and hack out.


    function get_alexa($url){
        $site = fopen('http://www.alexa.com/data/details/main?url='.urlencode($url),'r');
        while($cont = fread($site,1024657)){
            $total .= $cont;
        }
        fclose($site);
        $match_expression = '/for more information about the Alexa Web Information Service.–>(.*)<\/span><\/a>/Us';
        preg_match($match_expression,$total,$matches);
        return strip_tags($matches[1]);
    }
    PHP:
     
    ErectADirectory, Feb 20, 2007 IP
    dnahosting likes this.
  3. dnahosting

    dnahosting Active Member

    Messages:
    385
    Likes Received:
    9
    Best Answers:
    0
    Trophy Points:
    60
    #3
    Thanks EAD, I will try it out.
     
    dnahosting, Feb 20, 2007 IP