I need to be able to extract data from existing websites such as amazon, play etc. Data such as titles, images, prices and listing categories and url for their products and store this information in a database. Obviously these retailers will update and add to their product list all of the time so i will need my database to be always up to date. Also as i will be gathering lots of information i will the whole process to be done pretty quickly. Can anyone recommend an automated software for what i am trying to achieve?
I would suggest you to use RSS feeds from those sites & integrate it into your website using RSS parsers. Also you can sign-up with them to get premium feeds & they may also provide you with the necessary APIs to integrate on your site.
If you are familiar with python, you can try out http://scrapy.org/ It is an free & open source tool for scraping website. It's features are oriented towards scraping content from Ecommerce sites. I had used it a couple of years back and would highly recommend it
It is highly recommended to use API for those sites that have API. For example, Amazon has free Product Advertising API (PA API) that can be used for searching and extracting product data. Then you can look if site has RSS feeds and what this feeds deliver. And the last resort is using scraping technics. Scraping has 2 disadvantages: - most sites don't like to be scraped and definitely prohibit scraping in their TOS; - scrapers use HTML as source and are sensitive to changes in site layout. Unfortunately there are a lot of sites where scraping is the only way to extract data.
I would recommend you to get the data from either of the below ways: a. API b. RSS c. If shopping cart then get the data via external services (just google it, you will find loads of site ) d. Programmatic in php via curl or python
I agree with Mikhail on the Amazon solution: use the official Product API. It is by far most reliable and safest solution long term. Unfortunately their API is a little confusing for beginners and no complete useful examples are available. As for the parser/scraper, if you are familiar with Javascript I can suggest to use: Noodlejs.com that works great and also has some great examples and documentation. Hope this helps, cheers. S.