If you need to extract content from this or that website, I may help you. I have a java-based spider running on top of several ec2 instances with many threads looking for content on the web. Yesterday I processed 75,000 pages with 1 ec2 instance (1gb of ram, 2 cores), and I can have as many as I want because of the app's architecture. I am currently processing big data from big websites like twitter, linkedin and others... PM me if you have questions.
what kind of format will it the data be in? and can it be customized to my needs? (i.e i need a export of all the products on a E-Cart website, of all the product picture names and prices in a Excel format) can you supply to such custom demands? more details will be nice.
The output format is irrelevant because the data can be transformed to whatever you want... - XML - JSON - MYSQL - CSV - EXCEL I can process the info if you want. Like extract frequency, trends, and other cool things... thats not the hard part here...