1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

what are the limits in scraping search result data from Google/Bing/Yahoo/other sites

Discussion in 'Search Engine Optimization' started by arvindikchari, Feb 14, 2012.

  1. #1
    Are there some definite limits on scraping search results from the leading search engines?

    Eg. I have heard that one IP should be used to scrape max 300 results from Google, in a 24 hour period.

    Is this limit correct? What are the limits for- Google/Yahoo/Bing/AOL/Ask/Lycos/Excite/Exalead/other search engines?

    Also, is the 300 result limit for only pages on "google.com"? Meaning that in a single day 300 results can be scraped from Google.com, another 300 results from google.co.uk, and so on? Or is the 300 result limit of scraping per day a combined total across all google regional search engines?

    Again, how is this limit applied for other search engines (namely Google/Yahoo/Bing/AOL/Ask/Lycos/Excite/Exalead)?

    How to prevent blacklisting of an IP? Also, how do I know that my IP is about to be blacklisted? I will be using a scraper script to scrape results, is there some way to take care of the above points at the programming stage itself? Is there some specific response I will get back when Google (or other search Engine) is thinking of blacklisting my results? How do I know this exact specific response for leading search engines-Google/Yahoo/Bing/AOL/Ask/Lycos/Excite/Exalead/others?

    Regards,
    Arvind.
     
    Last edited: Feb 14, 2012
    arvindikchari, Feb 14, 2012 IP
  2. nikomaster

    nikomaster Member

    Messages:
    606
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    30
    #2
    The limit is 1000 from google and yahoo. I do not know about bing. I use a software I developed to scrape up to 10000 results in about minute. Nevertheless, I use proxies to do it.
     
    nikomaster, Feb 14, 2012 IP
  3. stock_post

    stock_post Prominent Member

    Messages:
    5,213
    Likes Received:
    249
    Best Answers:
    0
    Trophy Points:
    310
    #3
    What kind of data do you scrape and what do you do with it?
     
    stock_post, Feb 14, 2012 IP
  4. nikomaster

    nikomaster Member

    Messages:
    606
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    30
    #4
    From search engines I scrape web results. The program I developed not just only scrapes data from websites it also analyses each website. For example, I added a function that analyzes Wordpress blogs and determine whether they do follow comments or not. The program performs the analysis on each site retrieved by the scrapper. Just to give an example, from 10000 results I found 200 dofollow blogs in less than 45 minutes using regular keywords with the footprint "powered by wordpress". I have not performed this analysis using the proper footprint, I could find more df blogs using the proper keywords.

    Moreover, to make it even more flexible, the user can create their own tools to search specific things in websites at a HTML code level. For example, a resource page validator, keyword density finder, or if you are searching for specific CMS platforms that only keeps their foot prints in HTML code. All you need is a little experience in coding XML.

    I hope to release this piece of software soon.
    I let you know when its ready so you guys test the demo and tell me what you think about it.
     
    nikomaster, Feb 14, 2012 IP
  5. arvindikchari

    arvindikchari Member

    Messages:
    75
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    41
    #5
    I have one query for you- you mentioned that the Google limit is 1000 results- does this mean that in 24 hours from a single IP, 1000 results from Google.com, 1000 results from Google.co.uk can be retrieved? or is 1000 results the total for all Google sites?

    Also, can you tell me what message you receive as response, when Google detects that you are scraping? What I have heard is that they show some kind of captcha? Is this true? If not, then what does Google show/ask when it suspects scraping?
     
    arvindikchari, Feb 14, 2012 IP
  6. nikomaster

    nikomaster Member

    Messages:
    606
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    30
    #6
    Google bans you and shows the captcha only when you retrieve results using the same ip with automated software. Automated software usually retrieves data very fast so that is pretty unnatural. The result limit per search query is 1000. I mean despite of returning like 1000000 results it will only show 1000, and you can browse those results in not time by removing google instant and adjusting the number of results per page to 100, in the settings menu.
     
    nikomaster, Feb 14, 2012 IP