1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

HELP extracting data from existing websites

Discussion in 'Databases' started by mrG93, Feb 11, 2014.

  1. #1
    I need to be able to extract data from existing websites such as amazon, play etc. Data such as titles, images, prices and listing categories and url for their products and store this information in a database. Obviously these retailers will update and add to their product list all of the time so i will need my database to be always up to date. Also as i will be gathering lots of information i will the whole process to be done pretty quickly.
    SEMrush
    Can anyone recommend an automated software for what i am trying to achieve?
     
    mrG93, Feb 11, 2014 IP
    SEMrush
  2. libanadam

    libanadam Well-Known Member

    Messages:
    68
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    118
    #2
    You can either program the scraping yourself, or you can use a tool like import.io
     
    libanadam, Feb 12, 2014 IP
  3. phptechie

    phptechie Well-Known Member

    Messages:
    869
    Likes Received:
    10
    Best Answers:
    2
    Trophy Points:
    165
    #3
    I would suggest you to use RSS feeds from those sites & integrate it into your website using RSS parsers.

    Also you can sign-up with them to get premium feeds & they may also provide you with the necessary APIs to integrate on your site.
     
    phptechie, Feb 25, 2014 IP
  4. WebLab

    WebLab Active Member

    Messages:
    229
    Likes Received:
    7
    Best Answers:
    2
    Trophy Points:
    65
    #4
    Seems a nice free tool. Thanx for sharing.
     
    WebLab, Feb 25, 2014 IP
  5. mandat

    mandat Greenhorn

    Messages:
    69
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    16
    #5
    try web content extractor
    as i know its not free, but have demo
     
    mandat, Feb 25, 2014 IP
  6. Sreeram31

    Sreeram31 Greenhorn

    Messages:
    2
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    21
    #6
    If you are familiar with python, you can try out http://scrapy.org/

    It is an free & open source tool for scraping website. It's features are oriented towards scraping content from Ecommerce sites. I had used it a couple of years back and would highly recommend it
     
    Sreeram31, Feb 25, 2014 IP
  7. Mikhail19

    Mikhail19 Member

    Messages:
    10
    Likes Received:
    0
    Best Answers:
    1
    Trophy Points:
    41
    #7
    It is highly recommended to use API for those sites that have API. For example, Amazon has free Product Advertising API (PA API) that can be used for searching and extracting product data. Then you can look if site has RSS feeds and what this feeds deliver. And the last resort is using scraping technics. Scraping has 2 disadvantages:
    - most sites don't like to be scraped and definitely prohibit scraping in their TOS;
    - scrapers use HTML as source and are sensitive to changes in site layout.
    Unfortunately there are a lot of sites where scraping is the only way to extract data.
     
    Mikhail19, Mar 29, 2014 IP
  8. adams26

    adams26 Well-Known Member

    Messages:
    18
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    108
    #8
    I would recommend you to get the data from either of the below ways:
    a. API
    b. RSS
    c. If shopping cart then get the data via external services (just google it, you will find loads of site )
    d. Programmatic in php via curl or python
     
    adams26, Apr 9, 2014 IP
  9. shmekerosu

    shmekerosu Active Member

    Messages:
    571
    Likes Received:
    8
    Best Answers:
    0
    Trophy Points:
    90
    #9
    I agree with Mikhail on the Amazon solution: use the official Product API. It is by far most reliable and safest solution long term. Unfortunately their API is a little confusing for beginners and no complete useful examples are available.

    As for the parser/scraper, if you are familiar with Javascript I can suggest to use: Noodlejs.com that works great and also has some great examples and documentation.

    Hope this helps, cheers.
    S.
     
    shmekerosu, May 5, 2014 IP