How to Scraper Google without being banned

Discussion in 'PHP' started by chenka, Jan 10, 2012.

  1. #1
    I'm try to cURL result from Google site with for check keyword ranking by website
    because Google API had important limitations, can only get the first 8 results for any search and number of searches per day limited too.

    It not problem when run script on localhost but on share hosting will get 503 header

    Errors

    Code: 22
    Message: The requested URL returned error: 503
    =============================================

    Cause is google will block when same IP try to a lot of request
    I'm research for this problem and solution is

    1. use proxy when cURL to Google
    2. showing the captcha to the user and reset Cookie, Header, user-agent, IP
    For choice one it hard to find reliable proxy and must pay for it.

    I'm try to resolve problem with choice two but has problem with coding

    When you get error 503. If you open with browser will show captcha to user like this

    NHaAu.jpg

    But when use PHP cURL I can't get the result on the picture .It will get FALSE boolen and empty value and nothing result
    this is code that I write with codeigniter

    $this->load->library('curl');
    $this->curl->create('http://www.google.co.th/search?&ie=UTF-8&q=game&num=100');
    $result = $this->curl->execute();
    var_dump($result);
    $this->curl->debug();

    The result will show same above

    bool(false)
    cURL Test

    =============================================
    Response

    =============================================
    Errors
    Code: 22
    Message: The requested URL returned error: 503
    =============================================
    Info

    Array
    (
    )

    How to show captcha to user and letting him to write the letters, sending this to Google and saving the cookie to continue with the requests.I can't coding to do this please help me

    Thank you
     
    chenka, Jan 10, 2012 IP
  2. geofox

    geofox Peon

    Messages:
    12
    Likes Received:
    0
    Best Answers:
    1
    Trophy Points:
    0
    #2
    To scrape Google without getting banned every time, you need list of proxies (more proxies -> better).
    Also you better set timeouts between requests to Google. I would also recommend to find list of Google's datacenters IPs to request different servers.

    Resume: list of proxies + good proxies rotation + timeouts + list of google's datacenters IPs.
    But still you can be banned :D Google parsing is precise work.
     
    geofox, Jan 10, 2012 IP
  3. saeedsjaan

    saeedsjaan Member

    Messages:
    127
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    26
    #3
    Use usleep() function to set a gap between each request, hope this will work bit better.
     
    saeedsjaan, Jan 13, 2012 IP
  4. xixilee

    xixilee Peon

    Messages:
    30
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #4
    Google sucks, you can use bing API.
     
    xixilee, Feb 8, 2012 IP