1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

Web Scraping: Everybody Does It, Yet Nobody Talks About It

Discussion in 'General Business' started by Delonte, Jan 30, 2015.

  1. #1
    Hi everyone,

    I want to share my insights about data mining business. Since the start it was a very strange area for me. I've been in web scraping business for the last two years or so, in and out. Two years in, but it's still difficult for me to understand how big and important this market is. Actually, it took me a while to understand who are my main competitors and until this point I still believe I haven't yet identified them all (yeah, that's kind of crazy...).

    This what I don't get. For many businesses their main idea is based on the success of data harvesting, or in other words, how effectively they can 'borrow' the information from multiple sources. Yet there is almost no information on the internet about companies that provide such services, and the ones that are a bit more open are trying to sell their business as a legit service for low-end data harvesting. In most cases it's true, but high scale data scraping when 1000's of websites are constantly monitored and scanned is nowhere near close to being 'legal' (at least in terms of social responsibility).

    Let me explain - data can be the main ingredient of your product/service and you can get it without any problems. You scrape a website that does not really care if it's being scraped (i.e. government agencies, national statistics departments). Completely opposite is when your targets don't like to be scraped. They are going to try to block you (by your IP address most of the time), and that's where 'advanced techniques' come in. And that's were it all starts to look at little less transparent.

    I could understand why these companies want to stay away from public. They don't need to shout about what they're doing as their customers find them on their own. What I don't get is that no one else talks about it. No one really shares opinion about data scraping market. Technically it falls under 'big data' definition, let's be realistic - it's a completely different thing.

    Strange, to say the least. What do you guys think about that? Do you think if this market is going to get more transparency, especially with 'big data' expanding?

    Cheers!
     
    Delonte, Jan 30, 2015 IP
  2. nathan neeley

    nathan neeley Greenhorn

    Messages:
    61
    Likes Received:
    5
    Best Answers:
    0
    Trophy Points:
    23
    #2
    great insight and great picture. D fish that saved LA
     
    nathan neeley, Feb 11, 2015 IP
  3. nerdtimez

    nerdtimez Peon

    Messages:
    6
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    1
    #3
    The question is one of incentive: why would someone tell you they're having a competitive advantage from web scraping? The minute they do that, they lose the edge.
     
    nerdtimez, Feb 16, 2015 IP
  4. Joseph Mancia

    Joseph Mancia Member

    Messages:
    58
    Likes Received:
    15
    Best Answers:
    0
    Trophy Points:
    35
    #4
    You paint with a wide paint brush when you say "Everybody Does It"! I certainly don't scrape and I know thousands more who don't. It's illegal and against the TOS of most service providers.
     
    Joseph Mancia, Feb 16, 2015 IP
    malky66 likes this.
  5. sysdev0

    sysdev0 Member

    Messages:
    30
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    26
    Digital Goods:
    1
    #5
    I would say the fact of the matter is, it's going to happen. You can either take advantage or not.. it does not mean it'll go away.
     
    sysdev0, Feb 23, 2015 IP
  6. mikebvm

    mikebvm Member

    Messages:
    74
    Likes Received:
    5
    Best Answers:
    0
    Trophy Points:
    48
    #6
    I bet a lot fewer people scrape than you think. Is there any chance that you're reading about techniques from articles that are several years old?

    From the perspective of a web host, I can definitely tell you that we see almost no abuse tickets about that anymore, but used to see it constantly 5+ years ago. I know correlation doesn't mean causation, but still.
     
    mikebvm, Feb 23, 2015 IP
  7. TheDataPlanet.com

    TheDataPlanet.com Well-Known Member

    Messages:
    503
    Likes Received:
    7
    Best Answers:
    0
    Trophy Points:
    108
    #7
    I wouldn't use the word scrape. It's more of spider or miner. It's not about stealing any more. It's about interoperability.

    You know what the biggest scraper on the planet is? Guess!
     
    TheDataPlanet.com, Feb 23, 2015 IP
  8. jcw

    jcw Greenhorn

    Messages:
    14
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    13
    #8
    I think you want to believe "everybody does it" to make yourself feel better about doing something unethical and often illegal. Everyone doesn't do it and obviously most people who do participate in such activities don't want to talk openly, especially in writing on the internet, about their illicit activities. Even in the cases where the specific activity isn't illegal (and lets be honest, in most cases it IS illegal), it's still a shady, lowlife way of doing business. So the question pretty much answers itself.
     
    jcw, Feb 24, 2015 IP
  9. Delonte

    Delonte Greenhorn

    Messages:
    7
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    11
    #9
    Thanks for your responses! I guess you're all right, at least in a way. Once you start talking about web scraping, you become prone to losing that advantage, especially if you're a big player. On the other hand, it's pretty much obvious who does anonymous web scraping and who doesn't. Actually, the biggest players (someone like Bl00mberg, I suppose) are no longer doing that as many smaller companies are giving the data for free and on demand (via APIs), but market's #2 and everyone that's below are surely scraping/mining/digging as hard as they can.

    I'm just generalizing by saying 'everyone'. I had in mind businesses that are directly involved in competitive intelligence (even though that's just one example where data mining can be beneficial or even a must). They ARE scraping the internet. And I wouldn't say data scraping is illegal per se. It's still a grey area. Some TOS clearly states against it, some are vague about this, and some doesn't even mention that. Most surprising to me is that sometimes websites want to be scraped, even though their generic TOS prohibit that. If you're a cheap small airlines company, you'd love your flight rates to be scraped and compared with big airlines, as you know you'd be cheaper. If you're Turkish Airlines though, you won't be that happy as your service is more expensive as it is premium. Therefore simple price comparison won't tell a full story and will mislead customer.

    Anyway, I consider myself to be new in this area. I have a lot to learn and understand. Thank you for opinions, no matter what they are.

    TheDataPlanet, it's Google, isn't it?

    Nathan, D-Fish was an amazing player! Calm and composed. As a coach he's terrible though...
     
    Last edited: Feb 24, 2015
    Delonte, Feb 24, 2015 IP
  10. Delonte

    Delonte Greenhorn

    Messages:
    7
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    11
    #10
    Any more opinions on this topic?
     
    Delonte, Mar 2, 2015 IP
  11. PDD

    PDD Greenhorn

    Messages:
    67
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    23
    #11
    So much misinformation in this thread. Scraping is not illegal. And it's not against many website's ToS barring google and other search engines. How do you think google gets its information? Scraping and data mining. Also scraping isn't the same as data mining. Data mining is using algorithms to extract patterns, trends, and other meaningful information from a data set (ie. Machine learning). Scraping is the process of actually getting the data set.
     
    PDD, Mar 2, 2015 IP
  12. Delonte

    Delonte Greenhorn

    Messages:
    7
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    11
    #12
    PDD, thanks for your input. Perhaps the terminology I've used wasn't completely right, so thank you for making it clear what's what. Either way, you're not totally right about scraping to be legal and ToS not going against it. Most of the time it's true, especially when it comes to what Google is doing. Even though they are scraping, it's very simple activity with almost none real time data monitoring or deep drilling. Data harvesting techniques that I'm talking are way more serious.
     
    Delonte, Mar 4, 2015 IP