Hi guys, I was wondering if anyone out there would be able to help me... I was wondering if anyone has had any experience with the two following things... 1) Automatically entering data into a search box or form without actually having any input yourself (Maybe a spider or something of that type???) 2) Parsing the contents of an HTML page and breaking the results into an XML based document... I have been looking around for a little while but seem to have hit a dead end. I have no idea where to start (What languages need to be used etc...), and any help anyone could provide would be really appreciated. Cheers in advance guys ; ) SE
Would be fairly straightforward with PHP. Something like: - Read remote file in, feeding it your keyword $file = "http://www.domain.com/search.php?q=".$keyword[$x]; $handle = fopen($file, "r"); while (!feof($handle)) { $contents .= fread($handle, 8192); } fclose($handle); PHP: Then while in the loop, process $contents or do it as a bulk process afterwards. You'll need a script that writes out the XML based on the regular expression matches you get from the original file. $news['dateline'] = date("l, M. jS, Y", $news['dateline'])." ".vbdate($vboptions['timeformat'], $news['dateline']); // date format $news['pagetext'] = substr( strip_bbcode($news['pagetext']),0,200)."..."; // post format $news['pagetext'] = preg_replace ( "/-/"," ",$news['pagetext'] ); $news['pagetext'] = preg_replace ( "/“|â€/"," ",$news['pagetext'] ); $newsitems .= "<item>\r<title>$news[title]</title>\r<link>http://www.domain.com/news/news.php?i=$news[threadid]</link>\r<description>$news[pagetext]</description>\r<pubDate>$news[dateline]</pubDate>\r</item>"; PHP: Then write out the XML file, like... $filepointer = fopen("./yourrssfeed.xml", "w"); fputs ($filepointer, "<?xml version='1.0' encoding='iso-8859-1' ?>\r<rss version='2.0'>\r<channel>\r<title>domain.com: Top Stories</title>\r<link>http://www.domain.com/</link>\r<description>".$description."</description> \r<language>en-us</language>\r<copyright>Copyright 2004, domain, Inc. All Rights Reserved.</copyright>\r<pubDate>". date("l, M. jS, Y",time()). "</pubDate>\r<lastBuildDate>".$lastbuilddate."</lastBuildDate>\r<category>domain.com: Top Stories</category>"); fputs ($filepointer, $newsitems."\r</channel>\r</rss>"); fclose ($filepointer); PHP: