Automatic form entry and HTML parsing

Discussion in 'Programming' started by SEbasic, Jun 25, 2004.

  1. #1
    Hi guys, I was wondering if anyone out there would be able to help me...

    I was wondering if anyone has had any experience with the two following things...

    1) Automatically entering data into a search box or form without actually having any input yourself (Maybe a spider or something of that type???)

    2) Parsing the contents of an HTML page and breaking the results into an XML based document...

    I have been looking around for a little while but seem to have hit a dead end.

    I have no idea where to start (What languages need to be used etc...), and any help anyone could provide would be really appreciated.

    Cheers in advance guys ; )

    SE
     
    SEbasic, Jun 25, 2004 IP
  2. TheHoff

    TheHoff Peon

    Messages:
    1,530
    Likes Received:
    130
    Best Answers:
    0
    Trophy Points:
    0
    #2
    Would be fairly straightforward with PHP. Something like:

    - Read remote file in, feeding it your keyword

    $file = "http://www.domain.com/search.php?q=".$keyword[$x];
    
    $handle = fopen($file, "r");
    while (!feof($handle)) {
      $contents .= fread($handle, 8192);
    
    }
    fclose($handle);
    PHP:
    Then while in the loop, process $contents or do it as a bulk process afterwards. You'll need a script that writes out the XML based on the regular expression matches you get from the original file.

    $news['dateline'] = date("l, M. jS, Y", $news['dateline'])." ".vbdate($vboptions['timeformat'], $news['dateline']); // date format
    
    	$news['pagetext'] = substr( strip_bbcode($news['pagetext']),0,200)."..."; // post format
    
    	$news['pagetext'] = preg_replace ( "/-/"," ",$news['pagetext'] );
    
    	$news['pagetext'] = preg_replace ( "/“|”/"," ",$news['pagetext'] );
    
    	$newsitems .= "<item>\r<title>$news[title]</title>\r<link>http://www.domain.com/news/news.php?i=$news[threadid]</link>\r<description>$news[pagetext]</description>\r<pubDate>$news[dateline]</pubDate>\r</item>";
    PHP:
    Then write out the XML file, like...

    $filepointer = fopen("./yourrssfeed.xml", "w");
    
    fputs ($filepointer, "<?xml version='1.0' encoding='iso-8859-1' ?>\r<rss version='2.0'>\r<channel>\r<title>domain.com: Top Stories</title>\r<link>http://www.domain.com/</link>\r<description>".$description."</description> \r<language>en-us</language>\r<copyright>Copyright 2004, domain, Inc. All Rights Reserved.</copyright>\r<pubDate>". date("l, M. jS, Y",time()). "</pubDate>\r<lastBuildDate>".$lastbuilddate."</lastBuildDate>\r<category>domain.com: Top Stories</category>");
    
    fputs ($filepointer, $newsitems."\r</channel>\r</rss>");
    
    fclose ($filepointer);
    PHP:
     
    TheHoff, Jun 25, 2004 IP
  3. SEbasic

    SEbasic Peon

    Messages:
    6,317
    Likes Received:
    318
    Best Answers:
    0
    Trophy Points:
    0
    #3
    That is great!

    Thanks for your help...

    Welcome to the forums BTW ; )
     
    SEbasic, Jun 25, 2004 IP
  4. TheHoff

    TheHoff Peon

    Messages:
    1,530
    Likes Received:
    130
    Best Answers:
    0
    Trophy Points:
    0
    #4
    Thanks and you're welcome! Need help fleshing it out, let me know.
     
    TheHoff, Jun 25, 2004 IP
  5. SEbasic

    SEbasic Peon

    Messages:
    6,317
    Likes Received:
    318
    Best Answers:
    0
    Trophy Points:
    0
    #5
    I probabally will (real new to php)...

    Let you know...

    Cheers for the offer
     
    SEbasic, Jun 25, 2004 IP