PHP scan file, display matches?

JosS Guest

Messages:: 369

Likes Received:: 5

Best Answers:: 0

Trophy Points:: 0

#1

I don't know if this is possible and what sort of things I need to look into, and was hoping one of you could help me out.

Basically I have a URL, for instance http://www.mysite.com/yeah.php

and on this page, is a list of URLS/page text. I need to setup something so it will go through the source of the page and grab all the URL's containing a term such as 'pageno='

Then grab all these URL's and then spit them out in a list?

What sort of things would I need to look at on php.net to accomplish this?

Cheers in advance.

JosS, Jun 3, 2007 IP

krt Well-Known Member

Messages:: 829

Likes Received:: 38

Best Answers:: 0

Trophy Points:: 120

#2

File handling functions or cURL. This usually does the job:
$html = file_get_contents('http://site.com/page');
PHP:
As for getting the URLs from the HTML that the above function returns, look up regex pattern matching.
Something like:
preg_match_all("~http://site.com/\?pageno=\d+~", $html, $m);
PHP:

krt, Jun 3, 2007 IP

JosS Guest

Messages:: 369

Likes Received:: 5

Best Answers:: 0

Trophy Points:: 0

#3

If I do that, then say "echo $m;" the file just says "Array"

Should It just display every result it finds?

JosS, Jun 3, 2007 IP

JosS Guest

Messages:: 369

Likes Received:: 5

Best Answers:: 0

Trophy Points:: 0

#4

Ok I worked that part out,

print_r($m);

I'll see how i go from here. Thanks a tonne mate! I will post when I have my project completed

JosS, Jun 3, 2007 IP

JosS Guest

Messages:: 369

Likes Received:: 5

Best Answers:: 0

Trophy Points:: 0

#5

Gah,

How do I get rid of all the Array ( [0] => Array ( [0] => stuff, and just echo each results on its own with a <br />

after it?

JosS, Jun 3, 2007 IP

raredev Peon

Messages:: 49

Likes Received:: 1

Best Answers:: 0

Trophy Points:: 0

#6

how about this one


$html = file_get_contents('http://site.com/page');
$lines = explode("\n", $html);
foreach ($lines as $line)
    if (strpos($line, 'pageno=') !== FALSE)
        echo $line.'<br/>';

PHP:

raredev, Jun 3, 2007 IP

JosS Guest

Messages:: 369

Likes Received:: 5

Best Answers:: 0

Trophy Points:: 0

#7

Thanks! That's like perfect, but I see it echos all the other HTML on the URL line of the page.

Is there anyway to strip the URL's only, and display them out of the $lines

JosS, Jun 3, 2007 IP

krakjoe Well-Known Member

Messages:: 1,795

Likes Received:: 141

Best Answers:: 0

Trophy Points:: 135

#8

<?
function extract_urls_from_html( $location, $inurl )
{
	$returns = array( );
	
	if( ( $handle = fopen( $location, 'r' ) ) )
	{
		while( !feof( $handle ) )
		{
			if( preg_match( $inurl, fgets( $handle, 4096 ), $matches ) )
			{
				$returns[ ]	= $matches[1];
			}
			
		}	
		fclose( $handle );
	}
	return $returns[1] ? $returns : false ;
}

$urls = extract_urls_from_html( "http://yoursite.com/yeah.php", '~href="(.*?pageno=.*?)"~' );

if( is_array( $urls ) )
{
	print( implode("<br />\n", $urls ) );
}
?>
PHP:
Something like that...... gimme a link to the page if the regex doesn't work, regex is a pretty specific thing to use, it's not normally good enough just to say a link like "pageno=" you would have to give the exact structure of the link should you not want the function to return anything else at all.

krakjoe, Jun 3, 2007 IP

JosS likes this.

Log in or Sign up

PHP scan file, display matches?

JosS Guest

krt Well-Known Member

JosS Guest

JosS Guest

JosS Guest

raredev Peon

JosS Guest

krakjoe Well-Known Member

Useful Searches