Hey there, I'm wondering if anybody knows if there's a plugin that is able to spider the content of outgoing links that are in a post. In my case I'm importing an RSS feed, and each rss item contains a link. What I would like to do is spider that link for it's content and tag the post on my site with the found content. Does anyone know if that's possible at all?
I couldn't find any plugins in my brief search, but that doesn't mean one doesn't exist, some deep digging through the plugin repository or Google may turn up something. It is possible. A script could be written to parse the RSS content for links, then there are a few php functions that could be used to go to each link and get the pages content. I think you can use php's curl function to retrieve the page and then parse it for the relevant content. There is also this function, "How Do I Read the Contents of a Remote Web Page Using PHP?", used with this function if your host has the load() function disabled, "load() Function for PHP - Fetch URL Content", which seems to have a similar goal. I'm sure there are other ways to do it too, I'm surprised no on else has posted anything. The next part would involve how to determine the tags generated from the content and getting those tags added added to the post, which involves areas of WordPress I am not as familiar with. So the whole idea is possible, somewhat complex, but possible.
Thanks for your reply. I'll check out the links you posted. I know I can filter out links from rssfeeds with Yahoo Pipes, and maybe something like Autoblogged is able to do the rest...