I have an affiliate program in which a company offers an RSS feed of their products. I want to do some stuff with a searchable web site, so this is what I'm thinking: 1. Get the RSS feed and convert (parse?) it into a MySQL database. 2. Do the search with PHP and stuff. I'm ok with PHP and MySQL, it's the RSS (and step 1) that is new to me. Does the above make sense? I haven't actually seen the RSS file yet, I don't even know what it looks like! Is it just text? Thanks.
Makes sense. I think the feed comes in XML, so you have to parse the XML to the DB.. PHP Has a built in XML Parser..
You can query an XML file with XQuery and XPath You can think of XQuery being for XML what SQL is for database. Maybe, depending of what you want to do, you don't need to fill up a db from your XML to extract data. http://www.w3.org/XML/Query/ http://www.zend.com/php5/articles/php5-xmlphp.php
Thanks for the tips. I've run into a bit of a snag though; the RSS feeds offered by this particular website are themed, and are only a small part of the advertiser's stock! They do me no good. (They're like top 50 sellers and things like that. I want to have a searchable box on my own site, as well as be able to manipulate/style the product data to my liking. I've emailed the company about it, hopefully I'll get some reply. But that brings me to another problem I've wondered about: is it possible to write some php or javascript that reads info from the pages of some site and stores it somehow? I know it's possible, but is it way difficult? If it's ok with the website owner, could I somehow download information off web pages into a database? Thanks for the tips guys!
Anything is possible. What you're talking about is scraping. I would suggest looking at the PHP Functions preg_match and preg_replace, and searching on Google or Oreilly for Regular Expression matching. As an aside, for parsing RSS feeds in PHP, I came across a class called MagPie. Makes it sooo much simpler. Good Luck!
As luck would have it I'm just writing a site like this at the moment, and stumbled across your post when Googling. I'd highly recommend using the free Magpie library for downloading and parsing RSS. Find it on Google ("magpie rss"). No point reinventing the wheel. You set up a script which pulls down the RSS feed and sticks it into MySQL, then create a cron job which executes the script at a set interval (ie. hourly). Then your search engine on the front-end can do whatever it likes with the data. Web scraping (extracting data from HTML) is a whole different kettle of fish. It can be done using the CURL library and a lot of regular expression wizardry, but if the retailer changes the layout of their site you'll have to do the work all over again. It's quite unreliable, but it can be done.
You may possibly be wasting lots of energy going in the wrong direction. If they offer a product RSS feed then I bet they have a regular product datafeed they offer as CVS or XML. Wouldn't this be much easier? KEY - if you are going to convert the data you need to be sure the affiliate link still tracks and the cookie is passed. Affiliate datafeeds already have your affiliate link embedded in each link. I've seen very few RSS feeds that have the affiliate links embedded.
In the end, I just downloaded the tab-delimited file (163MB!) and wrote a simple php page to read it and store the info in the MySQL database. It was easy and fast and there were no problems! I did donwload MagPie but it seemed complicated, and my task was fairly simple anyway. Read data from text file, store data in database. I guess that's what MagPie does, but I was put off by the large number of files and junk in the MagPie folder I found the screen-scraping to be fairly easy too, though you have to watch not to scrape too many pages in a short time, I got my IP blocked from the site for a couple hours. I scrape info and store it into the database. If the vendor changes the page layout in the future, I can just adjust the script, but the old info will still be good. Thanks for the tips everyone, especially oziman for the preg_match_all stuff!