Yes as the title says How I can extract internal links of a website from a particular page? only Internal links and no external links
function fetch_links($url) { if (!preg_match('/^https?:\/\/(\w+\.)?([^\/]+)/i', $url, $host)) { trigger_error('Invalid URL given.'); return false; } if (preg_match_all('/<a.+href\s*=\s*["\']([^"\']+)[^>]*>.*?<\/a>/i', @file_get_contents($url), $links)) { foreach (array_unique($links[1]) AS $index => $link) { $link = trim($link); if (preg_match('/^(ht|f)tps?:\/\//i', $link) AND !preg_match('/^(ht|f)tps?:\/\/(\w+\.)?' . preg_quote($host[2], '/') .'/i', $link) OR $link[0] == '#') { unset($links[1][$index]); } } return $links[1]; } return false; } PHP: Usage example: echo '<pre>' . print_r(fetch_links('http://forums.digitalpoint.com/showthread.php?t=367847'), true) . '</pre>'; PHP:
I, replaced url by my site http://www.jeffbrowninc.com Result It does not seem to show all the internal links and how to do with a url if it has PHPsession ID
Just a slight error in the pattern. This works for me. function fetch_links($url) { if (!preg_match('/^https?:\/\/(\w+\.)?([^\/]+)/i', $url, $host)) { trigger_error('Invalid URL given.'); return false; } if (preg_match_all('/<a.*?href\s*=\s*["\']([^"\']+)[^>]*>.*?<\/a>/i', @file_get_contents($url), $links)) { foreach ($links[1] AS $index => $link) { $link = trim($link); if (preg_match('/^(ht|f)tps?:\/\//i', $link) AND !preg_match('/^(ht|f)tps?:\/\/(\w+\.)?' . preg_quote($host[2], '/') .'/i', $link) OR $link[0] == '#') { unset($links[1][$index]); } } return array_unique($links[1]); } return false; } PHP: And what do you mean with the session ID? Do you want to remove it from the string? If so, try replacing this: $link = trim($link); PHP: With: $links[1][$index] = trim(preg_replace('/(&|\?)(s|PHPSESSID)=[a-f0-9]{32}/', null, html_entity_decode($link))); PHP: