SEbasic
Jun 25th 2004, 7:20 am
Hi guys, I was wondering if anyone out there would be able to help me...
I was wondering if anyone has had any experience with the two following things...
1) Automatically entering data into a search box or form without actually having any input yourself (Maybe a spider or something of that type???)
2) Parsing the contents of an HTML page and breaking the results into an XML based document...
I have been looking around for a little while but seem to have hit a dead end.
I have no idea where to start (What languages need to be used etc...), and any help anyone could provide would be really appreciated.
Cheers in advance guys ; )
SE
TheHoff
Jun 25th 2004, 7:34 am
Would be fairly straightforward with PHP. Something like:
- Read remote file in, feeding it your keyword
$file = "http://www.domain.com/search.php?q=".$keyword[$x];
$handle = fopen($file, "r");
while (!feof($handle)) {
$contents .= fread($handle, 8192);
}
fclose($handle);
Then while in the loop, process $contents or do it as a bulk process afterwards. You'll need a script that writes out the XML based on the regular expression matches you get from the original file.
$news['dateline'] = date("l, M. jS, Y", $news['dateline'])." ".vbdate($vboptions['timeformat'], $news['dateline']); // date format
$news['pagetext'] = substr( strip_bbcode($news['pagetext']),0,200)."..."; // post format
$news['pagetext'] = preg_replace ( "/-/"," ",$news['pagetext'] );
$news['pagetext'] = preg_replace ( "/“|”/"," ",$news['pagetext'] );
$newsitems .= "<item>\r<title>$news[title]</title>\r<link>http://www.domain.com/news/news.php?i=$news[threadid]</link>\r<description>$news[pagetext]</description>\r<pubDate>$news[dateline]</pubDate>\r</item>";
Then write out the XML file, like...
$filepointer = fopen("./yourrssfeed.xml", "w");
fputs ($filepointer, "<?xml version='1.0' encoding='iso-8859-1' ?>\r<rss version='2.0'>\r<channel>\r<title>domain.com: Top Stories</title>\r<link>http://www.domain.com/</link>\r<description>".$description."</description> \r<language>en-us</language>\r<copyright>Copyright 2004, domain, Inc. All Rights Reserved.</copyright>\r<pubDate>". date("l, M. jS, Y",time()). "</pubDate>\r<lastBuildDate>".$lastbuilddate."</lastBuildDate>\r<category>domain.com: Top Stories</category>");
fputs ($filepointer, $newsitems."\r</channel>\r</rss>");
fclose ($filepointer);
SEbasic
Jun 25th 2004, 7:45 am
That is great!
Thanks for your help...
Welcome to the forums BTW ; )
TheHoff
Jun 25th 2004, 8:04 am
Thanks and you're welcome! Need help fleshing it out, let me know.
SEbasic
Jun 25th 2004, 9:44 am
I probabally will (real new to php)...
Let you know...
Cheers for the offer
vBulletin® v3.6.8, Copyright ©2000-2008, Jelsoft Enterprises Ltd.