1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

Looking for HTML site scraper (for my own site) to scrape structured data

Discussion in 'Microdata' started by keithjameslock, Dec 22, 2009.

  1. #1
    I have a site with 1000's of pages. I would like a script that would scrape 6 pieces of content from the product pages (approx 50k products). The product pages are all 2 folders deep in the form of: http://www.domain.com/cat/subcat/product1.aspx. All other pages should be ignored. The data I want is structured the same on all pages.

    First, I want the full URL.
    Then I want the product name which is between the <h1> tags. There are no other h1 tags on the page.
    Also, I want the description. The description is in between the paragraph tags immediately below the HTML: "<h2>Product Description</h2>".
    etc. (I'll explain later)

    Then I want all of that data exported to an XML file... basically creating a new "item" for each product. Each "item" will also need a unique ID, which can be just started at 1. I'll tell you the exact format I need for the XML file, and the URL of the site with the products later.

    I would like it to be easy for me to be able to manipulate the structure of the XML data in case I need to add/edit elements.

    PM me with the following: Price, Turn Around Time. And...please take time outs into consideration. It's important that all the data is retrieved, plus I don't want to crash the server. So, let me know how those 2 potential issues will be handled.

    Thanks,
    Keith

    p.s. Any programming language is fine...
     
    Last edited: Dec 22, 2009
    keithjameslock, Dec 22, 2009 IP
  2. NetworkTown.Net

    NetworkTown.Net Well-Known Member

    Messages:
    2,022
    Likes Received:
    28
    Best Answers:
    0
    Trophy Points:
    165
    #2
    Ill be able to do this, but i have one question is all the websites that your going to be inputting the same coding structure? becuase the script will be looking for the same formatted structure it has been coded to in the url inputted. If you let me know this question ill send you a quote via pm.

    Thanks
     
    NetworkTown.Net, Dec 22, 2009 IP
  3. keithjameslock

    keithjameslock Peon

    Messages:
    416
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    0
    #3
    Please re-read the thread. I edited it completely. The script will just be for 1 site and all the product pages are structured the same.
     
    keithjameslock, Dec 22, 2009 IP
  4. frank007

    frank007 Well-Known Member

    Messages:
    160
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    123
    #4
    Please check PM :)
     
    frank007, Dec 22, 2009 IP
  5. innovatewebs

    innovatewebs Well-Known Member

    Messages:
    194
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    101
    #5
    Hi
    i am not getting this line "I would like it to be easy for me to be able to manipulate the structure of the XML data in case I need to add/edit elements."

    i hope you know the starting and ending productid
     
    innovatewebs, Dec 23, 2009 IP