Possible? - Extract all books from amazon category?

sjohal2006 Active Member

Messages:: 125

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 51

#1

Hi there,

I have a new website which I'm going to start on, just wondered though, to make the job so much easier, is it possible to take the amazon book dicrectory, and in particular just a single category ? So I don't have to reconstruct the entire list?.....

Thanks please get back to me

sjohal2006, Feb 7, 2009 IP

crivion Notable Member

Messages:: 1,669

Likes Received:: 45

Best Answers:: 0

Trophy Points:: 210

Digital Goods:: 3

#2

if you know very well data scrapping with php regular expressions yep its possible

crivion, Feb 7, 2009 IP

sjohal2006 Active Member

Messages:: 125

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 51

#3

crivion said: ↑

if you know very well data scrapping with php regular expressions yep its possible
Click to expand...

Do you - Interested in programming a design ?

sjohal2006, Feb 7, 2009 IP

mmerlinn Prominent Member

Messages:: 3,197

Likes Received:: 819

Best Answers:: 7

Trophy Points:: 320

#4

Yes, it can be done with many different programming languages. In fact, any language that will run on your computer and that can access low level files can be used to scrape pages from the net.

However, there may be limits imposed on the amount of downloads from their site per hour to keep from exceeding bandwidth limits. There also might be something in their terms of service that makes it possible for them to come after you legally for data theft.

mmerlinn, Feb 7, 2009 IP

sjohal2006 Active Member

Messages:: 125

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 51

#5

Hmmm damn, data theft? Seriously...?

sjohal2006, Feb 8, 2009 IP

w0tan Peon

Messages:: 77

Likes Received:: 6

Best Answers:: 0

Trophy Points:: 0

#6

You could also use their API. Just limit how many items you download with it and you'll be fine.

w0tan, Feb 8, 2009 IP

sjohal2006 Active Member

Messages:: 125

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 51

#7

Supposedly it is roughly 12,000, is that far too many?

sjohal2006, Feb 8, 2009 IP

mmerlinn Prominent Member

Messages:: 3,197

Likes Received:: 819

Best Answers:: 7

Trophy Points:: 320

#8

sjohal2006 said: ↑

Hmmm damn, data theft? Seriously...?
Click to expand...

Any time you scrape someone else's pages you must consider "data theft." And there is no way to know whether that is a problem unless you read their terms of service. Generally if there is a copyright notice on the page you want to scrape, you run the risk of legal action. The risk is very low unless you are overloading their servers with requests. Scraping for commercial purposes is a lot riskier than for personal purposes because of the "Fair use" rule in the copyright law.

Uploading 12,000 pages is WAY too many. The Googlebot scraper typically limits itself to a maximum 417 page requests per DAY per SITE (At least it does on my site of over 4000 pages). Typically Googlebot only does 70 page requests per day on my site. I suggest you do the same when scraping pages.

mmerlinn, Feb 8, 2009 IP

zealus Active Member

Messages:: 70

Likes Received:: 1

Best Answers:: 0

Trophy Points:: 93

#9

Funny you want to steal the data that you can get for free through Amazon Web Services API. I am creating a web site right now that extracts Amazon data based on certain parameters and there is no need to scrape pages (given that they change often) if you can get better results through AWS.

But good luck anyway

zealus, Feb 11, 2009 IP

zealus Active Member

Messages:: 70

Likes Received:: 1

Best Answers:: 0

Trophy Points:: 93

#10

Uploading 12,000 pages is WAY too many.
Click to expand...

Not really, if you loading an inventory for the first time it might be ok - as long as your server can handle the load of so much stuff loaded in the first place and then all the search engine bots from all the search engines coming in to index the content. Google would be least of my worries here

zealus, Feb 11, 2009 IP

mmerlinn Prominent Member

Messages:: 3,197

Likes Received:: 819

Best Answers:: 7

Trophy Points:: 320

#11

zealus said: ↑

Not really, if you loading an inventory for the first time it might be ok - as long as your server can handle the load of so much stuff loaded in the first place and then all the search engine bots from all the search engines coming in to index the content. Google would be least of my worries here
Click to expand...

Its not a question of whether your server can handle the page load. It is a question of whether the bandwidth load on AMAZON'S servers would be a problem.

mmerlinn, Feb 11, 2009 IP

zealus Active Member

Messages:: 70

Likes Received:: 1

Best Answers:: 0

Trophy Points:: 93

#12

Its not a question of whether your server can handle the page load. It is a question of whether the bandwidth load on AMAZON'S servers would be a problem.
Click to expand...

Are you trying to say Amazon's database servers won't handle yet another request for 12,000 records? Or am I laughing too early and misunderstood what you're asking?

zealus, Feb 17, 2009 IP

mmerlinn Prominent Member

Messages:: 3,197

Likes Received:: 819

Best Answers:: 7

Trophy Points:: 320

#13

zealus said: ↑

Are you trying to say Amazon's database servers won't handle yet another request for 12,000 records? Or am I laughing too early and misunderstood what you're asking?
Click to expand...

You misunderstood or did not think it through.

Yes, I am sure that Amazon COULD handle 12,000 record requests coming from ONE person at a time. But what if 10,000 people requested 12,000 records at the same time? Could it happen? Yes. Could they then handle it? I don't know. So, it still is a bandwidth issue for Amazon.

If I were Amazon, I would be pissed if someone was making 12,000 requests. And I would put a reasonable limit of some kind on how many requests per hour anyone could make. WHY would I be pissed? Because the requester would be spending MY MONEY without my permission to enhance HIS LIFE. In other words, STEALING from ME for HIS benefit.

mmerlinn, Jun 12, 2010 IP

Log in or Sign up

Possible? - Extract all books from amazon category?

sjohal2006 Active Member

crivion Notable Member

sjohal2006 Active Member

mmerlinn Prominent Member

sjohal2006 Active Member

w0tan Peon

sjohal2006 Active Member

mmerlinn Prominent Member

zealus Active Member

zealus Active Member

mmerlinn Prominent Member

zealus Active Member

mmerlinn Prominent Member

Useful Searches