Help needed please

Discussion in 'PHP' started by ojsimon, Jan 7, 2008.

  1. #1
    Hi
    i found this code bellow in a forum, apparently it allows you to grab contents from any website, providing that you enter the starting and ending html points. I understand that this may be difficult and maybe pointless but i am interested, would it be possible to modify it so the administrator enters a website and then selects the section they want almost like cropping a website. Would this be possible? if so where should i start? what parts do i change? is there any current code that may help me out?

    Thanks
    <?php
    
    // Mini-Fetch - Remote Content Retrieval System
    
    //In this case, it fetches a search for "fresh content" from www.alltheweb.com, whom we hope you will visit.
    $theLocation="http://www.nespaintball.com/pages/tournaments.php?tid=3";
    //Below, at $start and $finish, you'll enter the start and finish points in the remote HTML.
    
    $startingpoint = "<table style=\"width:100%;padding:4px;border:1px solid #000;background:#7e661c;color:#fff;\">"; // replace inside the quotes with with your unique start point in the source of the HTML page. It HAS to be unique.
    $endingpoint = "</body>"; // replace with the unique finish point in the source of the HTML page 
    //Don't forget to escape any " marks with a \ mark.
    // Example: If the starting HTML is: <img src="images/something.jpg">
    // You would tell Mini-Fetch: $startingpoint = "<img src=\"images/something.jpg\">";
    
    //That's probably all you need to edit, unless you want to match and replace certain text or HTML.
    
    // - "Don't touch this part..."
    preg_match("/^(https?:\/\/)?([^\/]*)(.*)/i", "$theLocation", $matches);
    $theDomain = "http://" . $matches[2];
    $page = $matches[3];
    
    $fd = fopen($theDomain.$page, "r"); // can change to "rb", on NT/2000 servers, if problems.
    $value = "";
    while(!feof($fd)){
    $value .= fread($fd, 4096); 
    }
    fclose($fd);
    $start= strpos($value, "$startingpoint"); 
    $finish= strpos($value, "$endingpoint"); 
    $length= $finish-$start;
    $value=substr($value, $start, $length);
    // end "don't touch this part"
    
    
    // eregi_replace, below, is a case-insensitive function to find, match, and replace variations of text that you define.
    //The following commands strip or replace HTML tags. 
    //To NOT strip a certain HTML tag, add // before the line in question.
    // the "", before the $value at the end of the line means replace the tag with blank space, which effectively deletes the tag.
    
    // $value = eregi_replace( "<img src=[^>]*>", "", $value ); // Remove all image tags. This is disabled until you remove the // in front of this line.
    $value = eregi_replace( "<IMG alt=[^>]*>", "", $value ); // Remove all image alt="whatever" tags
    $value = eregi_replace( "<class[^>]*>", "", $value ); // Remove all variations of <class> tags.
    //$value = eregi_replace( "<table[^>]*>", "", $value ); // Remove ALL variations of <table> tags.
    //$value = eregi_replace( "<tr[^>]*>", "", $value ); // Replace <tr> tags with blank space.
    //$value = eregi_replace( "<td[^>]*>", "", $value ); // Remove all variations of <td> tags.
    $value = eregi_replace( "Signed up teams[^>]*>", "", $value );
    
    
    
    // Below - what's the difference, you ask, between eregi_replace and str_replace?
    // str_replace is faster, by a long shot... The catch is that in can only be used
    // to replace EXACT value matches, as you see below, and doesn't work well in huge files without using arrays.
    $value = str_replace( "</font>", "", $value ); // Remove closing </font> tags.
    //$value = str_replace( "</table>", "", $value ); // Remove closing </table> tags.
    //$value = str_replace( "</tr>", "", $value ); // Remove closing </tr> tags.
    //$value = str_replace( "</td>", "", $value ); // Remove closing </td> tags.
    //$value = str_replace( "<center>", "", $value ); // Remove <center> tag...
    //$value = str_replace( "</center>", "", $value ); // ...alignment calls.
    $value = str_replace( "<b>", "", $value ); // Remove <b> tags.
    $value = str_replace( "</b>", "", $value ); // Remove closing </b> tags...
    //$value = str_replace( "<table style=\"width:100%;padding:4px;border:1px solid #000;background:#7e661c;color:#fff;\">", "<table align=\"center\" border=\"0\" cellpadding=\"4\" cellspacing=\"1\" class=\"alt1\" width=\"100%\">", $value );
    $value = str_replace( "<td>No</td>", "", $value );
    $value = str_replace( "<td style=\"font:12px Arial,sans-serif;color:#fff;\"><b>PAID</b></td>", "", $value );
    $value = str_replace( "<td>No</td>", "", $value );
    $value = str_replace( "<a href=", "<a", $value );
    $value = str_replace( "<table style=\"width:100%;padding:4px;border:1px solid #000;background:#7e661c;color:#fff;\">", "<table>", $value );
    $value = str_replace( "</body>", "", $value );
    $value = str_replace( "<td style=\"font:12px Arial,sans-serif;color:#fff;\">", "", $value );
    $value = str_replace( "PAID", "", $value );
    $value = str_replace( "<td colspan=\"3\" style=\"font:12px Arial,sans-serif;color:#fff;\"></td>", "", $value );
    $value = str_replace( "</td>DIV</td>", "", $value );
    $value = str_replace( "TEAM NAME</td>", "", $value ); 
    
    
    
    // More tags. Just take out the // in front and edit as you like.
    //$value = eregi_replace( "Competitors name", "", $value ); // Remove certain text...
    //$value = eregi_replace( "<javascript[^>]*>", "", $value ); //remove javascripts
    //$value = eregi_replace( "<script[^>]*>", "", $value ); //remove scripts
    
    // replace normal links with HTML to open fetched links in new window
    $value = eregi_replace( "href=", "target=\"_blank\" href=", $value ); 
    
    // open links that use " in new window 
    $value = eregi_replace( "href=\"", "target=\"_blank\" href=\"", $value ); 
    
    $donstart = "<table class=\"tborder\" width=\"175\"><tr><td class=\"alt1\">";
    
    $donend = "</td></tr></table>";
    
    $FinalOutput = preg_replace("/(href=\"?)(\/[^\"\/]+)/", "\\1" . $theDomain . "\\2", $value);
    
    echo $donstart ;
    echo $FinalOutput ; //prints it to your page
    echo $donend ;
    
    flush (); //force output to your page faster
    
    ?>
    PHP:

     
    ojsimon, Jan 7, 2008 IP
  2. ojsimon

    ojsimon Active Member

    Messages:
    459
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    53
    #2
    please help
     
    ojsimon, Feb 14, 2008 IP
  3. ojsimon

    ojsimon Active Member

    Messages:
    459
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    53
    #3
    can no one help with this? I will try and help the person who helps me in anyway i can.

    thanks
     
    ojsimon, Feb 15, 2008 IP