1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

cURL with get parameters

Discussion in 'PHP' started by mbaldwin, Aug 20, 2011.

  1. #1
    Hi,
    I am trying to get info from some pages, but I am not having much luck. it seems the get parameters are not getting passed to my curl script to call the right page.

    
    <?php
    $start=microtime(true);
    require_once('db_config.php');
    $url='http://www.somedomain.com/links.php';
    $type='free';
    $page_num=3;
    $url=$url.'?type='.$type.'&page='.$page_num;
    $ch = curl_init($url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    $get_page = curl_exec($ch);
    curl_close($ch);
    FILE_PUT_CONTENTS('page_'.$page_num.'.txt', $get_page);
    $end=microtime(true);
    $total=$end - $start;
    echo '</br>'.$total;
    ?>
    
    Code (markup):
    When I echo the $url variable, it looks right on the screen, but maybe I am not quite understanding something with cURL functions, and I can't just attach the get parameters to the end of the url.

    I need to do it this way, because I will need to make a loop to cycle through different parameter values.

    When i do type it all out, and pass it to cURL in the $url variable, it works fine.
    I also do not get anything if I try to echo the $get_page variable right after the cURL commands.

    Any suggestions?

    Thanks,
    Michael
    SEMrush
     
    mbaldwin, Aug 20, 2011 IP
    SEMrush
  2. ssmm987

    ssmm987 Member

    Messages:
    180
    Likes Received:
    4
    Best Answers:
    3
    Trophy Points:
    43
    #2
    I tested it on my private server, and it seems to work ok.

    Are you sure that no POST variables need to be passed?
     
    ssmm987, Aug 21, 2011 IP
  3. mbaldwin

    mbaldwin Active Member

    Messages:
    215
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    95
    #3
    I am positive there are no post parameters neede to be sent. I can do
    
    $url='http://www.somesite.com/links.php?type=free&pagenumber=4';
    
    Code (markup):
    and it will go to that url.

    Is it possible the script is moving to fast,and I need to put a sleep() at the end of each loop?

    Thanks,
    Michael
     
    mbaldwin, Aug 21, 2011 IP
  4. iBank ™

    iBank ™ Peon

    Messages:
    63
    Likes Received:
    4
    Best Answers:
    1
    Trophy Points:
    0
    #4
    We won't be able to help you without having the actual URL you are working with. Try adding CURLOPT_FOLLOWLOCATION ( 1 ) and see if anything changes.
     
    iBank ™, Aug 21, 2011 IP
  5. mbaldwin

    mbaldwin Active Member

    Messages:
    215
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    95
    #5
    Okay, I will try that and post back with the results.

    I didn't know if I should post the actual url I am working with or not, but if your idea don't work, I could post the complete script, maybe I am just missing something else.

    Thanks,
    Michael
     
    mbaldwin, Aug 21, 2011 IP
  6. mbaldwin

    mbaldwin Active Member

    Messages:
    215
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    95
    #6
    Hi,
    That didn't help. It still only goes to the first page of the links i am trying to scrape from the site. Below is the complete code. Maybe it is something small I am just not catching, it has happened before.

    
    <?php
    $start=microtime(true);
    set_time_limit(0);
    ignore_user_abort();
    require_once('db_config.php');
    $site='';
    $add='';
    $url='http://www.onewaytextlink.com/links.php';
    $type='free';
    $page_num=1;
    $url=$url.'?type='.$type.'&page='.$page_num;
    $ch = curl_init($url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
    curl_setopt($ch, CURLOPT_TIMEOUT, 30);
    $scraped = curl_exec($ch);
    curl_close($ch);
    FILE_PUT_CONTENTS('scraped_page_'.$page_num.'.txt', $scraped);
    preg_match_all('/<a href="\/links.php\?type=free\&amp;pagenum=(.*)">(\d)<\/a>/', 
    $scraped, $pages, PREG_SET_ORDER);
    $newArr = array();
    foreach ($pages as $val) {
    $newArr[$val[2]] = $val;
    }
    $pages = array_values($newArr);
    $page_count=count($pages);
    $page_count++;
    for ($row1 = 0; $row1 < $page_count; $row1++) {
    $url=$url.'?type='.$type.'&page='.$page_num;
    $ch = curl_init($url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
    curl_setopt($ch, CURLOPT_TIMEOUT, 30);
    $scraped = curl_exec($ch);
    curl_close($ch);
    FILE_PUT_CONTENTS('scraped_page_'.$page_num.'.txt', $scraped);
    preg_match_all('/url=(.*)" target="_blank">(.*)<\/a>/i', $scraped, $links, PREG_SET_ORDER);
    $rowcount=count($links);
    for ($row2 = 0; $row2 < $rowcount; $row2++) {
    unset($links[$row2][0]);
    if(!preg_match('/^http:\/\//i', $links[$row2][1])) {
    $links[$row2][1] = 'http://'.$links[$row2][1];
    }
    $links[$row2][1]=preg_replace('/%2f/i', '/', $links[$row2][1]);
    $urls=$links[$row2][1];
    $title=$links[$row2][2];
    $sql="SELECT * FROM website_directory WHERE url='$urls'";
    $q=MYSQLI_QUERY($link, $sql);
    if(MYSQLI_FETCH_ASSOC($q) == 0) {
    $insert="INSERT INTO website_directory (title, url)VALUE('$title', '$urls')";
    $q=MYSQLI_QUERY($link, $insert);
    $add++;
    }
    $site++;
    }
    $page_num++;
    }
    echo $add.' sites have been added to the database.</br>';
    echo $site.' have been scanned.</br>';
    $end=microtime(true);
    $total=$end - $start;
    echo '</br>'.$total;
    ?>
    
    Code (markup):
    Thanks for any assistance.

    Michael
     
    mbaldwin, Aug 21, 2011 IP
  7. nonte

    nonte Active Member

    Messages:
    72
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    73
    #7
    There you go :-

    <?php
    $start=microtime(true);
    set_time_limit(0);
    ignore_user_abort();
    require_once('db_config.php');
    $site='';
    $add='';
    $url='http://www.onewaytextlink.com/links.php';
    $type='free';
    $page_num=1;
    $url=$url.'?type='.$type.'&pagenum='.$page_num;
    $ch = curl_init($url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
    curl_setopt($ch, CURLOPT_TIMEOUT, 30);
    $scraped = curl_exec($ch);
    curl_close($ch);
    FILE_PUT_CONTENTS('scraped_page_'.$page_num.'.txt', $scraped);
    preg_match_all('/<a href="\/links.php\?type=free\&amp;pagenum=(.*)">(\d)<\/a>/',
    $scraped, $pages, PREG_SET_ORDER);
    $newArr = array();
    foreach ($pages as $val) {
    $newArr[$val[2]] = $val;
    }
    print_r($pages);
    $pages = array_values($newArr);
    $page_count=count($pages);
    $page_count++;
    for ($row1 = 0; $row1 < $page_count; $row1++) {
    $url=$url.'?type='.$type.'&pagenum='.$page_num;
    $ch = curl_init($url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
    curl_setopt($ch, CURLOPT_TIMEOUT, 30);
    $scraped = curl_exec($ch);
    curl_close($ch);
    FILE_PUT_CONTENTS('scraped_page_'.$page_num.'.txt', $scraped);
    preg_match_all('/url=(.*)" target="_blank">(.*)<\/a>/i', $scraped, $links, PREG_SET_ORDER);
    $rowcount=count($links);
    for ($row2 = 0; $row2 < $rowcount; $row2++) {
    unset($links[$row2][0]);
    if(!preg_match('/^http:\/\//i', $links[$row2][1])) {
    $links[$row2][1] = 'http://'.$links[$row2][1];
    }
    $links[$row2][1]=preg_replace('/%2f/i', '/', $links[$row2][1]);
    $urls=$links[$row2][1];
    $title=$links[$row2][2];
    $sql="SELECT * FROM website_directory WHERE url='$urls'";
    echo $sql."\n";
    $q=MYSQLI_QUERY($link, $sql);
    if(MYSQLI_FETCH_ASSOC($q) == 0) {
    $insert="INSERT INTO website_directory (title, url)VALUE('$title', '$urls')";
    $q=MYSQLI_QUERY($link, $insert);
    $add++;
    }
    
    $site++;
    }
    $page_num++;
    }
    echo $add.' sites have been added to the database.</br>';
    echo $site.' have been scanned.</br>';
    $end=microtime(true);
    $total=$end - $start;
    echo '</br>'.$total;
    ?>
    PHP:
     
    nonte, Aug 21, 2011 IP
  8. ssmm987

    ssmm987 Member

    Messages:
    180
    Likes Received:
    4
    Best Answers:
    3
    Trophy Points:
    43
    #8
    The problem is fairly simple:
    
    $url='http://www.onewaytextlink.com/links.php'; //url =http://www.onewaytextlink.com/links.php
    
    PHP:
    $url=$url.'?type='.$type.'&pagenum='.$page_num;  //url=http://www.onewaytextlink.com/links.php?type=0&pagenum=0
    PHP:
    //url=http://www.onewaytextlink.com/links.php?type=0&pagenum=0
    And again:
    
    $url=$url.'?type='.$type.'&pagenum='.$page_num;  //url=http://www.onewaytextlink.com/links.php?type=0&pagenum=0?type=1&pagenum=1
    PHP:
    The get variables are added to the get variables from the request done before that.

    Simple fix:
    <?php
    $start=microtime(true);
    set_time_limit(0);
    ignore_user_abort();
    require_once('db_config.php');
    $site='';
    $add='';
    $baseurl='http://www.onewaytextlink.com/links.php';
    $type='';
    $page_num=1;
    $url=$baseurl.'?type='.$type.'&pagenum='.$page_num;
    $ch = curl_init($url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
    curl_setopt($ch, CURLOPT_TIMEOUT, 30);
    $scraped = curl_exec($ch);
    curl_close($ch);
    FILE_PUT_CONTENTS('scraped_page_'.$page_num.'.txt', $scraped);
    preg_match_all('/<a href="\/links.php\?type=free\&amp;pagenum=(.*)">(\d)<\/a>/',
    $scraped, $pages, PREG_SET_ORDER);
    $newArr = array();
    foreach ($pages as $val) {
    $newArr[$val[2]] = $val;
    }
    print_r($pages);
    $pages = array_values($newArr);
    $page_count=count($pages);
    $page_count++;
    for ($row1 = 0; $row1 < $page_count; $row1++) {
    $url=$baseurl.'?type='.$type.'&pagenum='.$page_num;
    $ch = curl_init($url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
    curl_setopt($ch, CURLOPT_TIMEOUT, 30);
    $scraped = curl_exec($ch);
    curl_close($ch);
    FILE_PUT_CONTENTS('scraped_page_'.$page_num.'.txt', $scraped);
    preg_match_all('/url=(.*)" target="_blank">(.*)<\/a>/i', $scraped, $links, PREG_SET_ORDER);
    $rowcount=count($links);
    for ($row2 = 0; $row2 < $rowcount; $row2++) {
    unset($links[$row2][0]);
    if(!preg_match('/^http:\/\//i', $links[$row2][1])) {
    $links[$row2][1] = 'http://'.$links[$row2][1];
    }
    $links[$row2][1]=preg_replace('/%2f/i', '/', $links[$row2][1]);
    $urls=$links[$row2][1];
    $title=$links[$row2][2];
    $sql="SELECT * FROM website_directory WHERE url='$urls'";
    echo $sql."\n";
    $q=MYSQLI_QUERY($link, $sql);
    if(MYSQLI_FETCH_ASSOC($q) == 0) {
    $insert="INSERT INTO website_directory (title, url)VALUE('$title', '$urls')";
    $q=MYSQLI_QUERY($link, $insert);
    $add++;
    }
    
    $site++;
    }
    $page_num++;
    }
    echo $add.' sites have been added to the database.</br>';
    echo $site.' have been scanned.</br>';
    $end=microtime(true);
    $total=$end - $start;
    echo '</br>'.$total;
    
    PHP:
     
    ssmm987, Aug 22, 2011 IP
  9. mbaldwin

    mbaldwin Active Member

    Messages:
    215
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    95
    #9
    Hi,
    Thanks to both of you for finding a problem. It took me a while to figure out what nont had changed, but after I got it, well, it just shows how something so simple can be over looked.
    Thanks, ssmm987 for figuring out that the get was being appended after the other get parameters. I should have caught that mistake to.


    Works great now, thanks again.

    Michael
     
    mbaldwin, Aug 22, 2011 IP
  10. merlinghost

    merlinghost Peon

    Messages:
    1
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #10
    if i put a special caractere to authentificate a user using GET parameter this not work

    exemple of pass : Camilie&

    the & in the Get url do a big problem

    Sorry fo my english and thanks for replay
     
    merlinghost, Dec 6, 2012 IP