Hello..i'm new to php so i need some real help in here... I trying to create a web scraper that grabs a forum's content and shows only the posts. . The source code is here: <html> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/> <?php $html = file_get_contents ('http://www.......'); $dom = new DomDocument(); @$dom->loadHTML ($html); $xpath = new DOMXPath ($dom); $key = $xpath->query ('//*[@class="postTextContainer"]'); foreach($key as $keys){ echo $keys->nodeValue ,"<br/> \n"; } ?> </html> can anyone tell me how i could grab all the posts that are in the same thread??now i can only grab the posts that are in the above url..i think it's called multiple page parsing?? I also want to ask how i can delete the content that exists between two tags and exists in the content that i have grabbed with the above code?? more specific the tag is <div class="........">bla bla</div>
You're obviously new to php because that code makes no sense at all at least to me. You're asking how and I will tell you I'm not going to write the code for you. 1. Fetch the page with posts. 2. Use preg_match_all() function + regex to find the posts 3. Do w/e it is you want to do with them. If you want to delete html tags, there is a function in php called strip_tags().
Thanks for your answer. Actually i tried this way, but i couldn't find a way to grab the content between <!-- message --> and <!-- / message -->. I couldn't find the right regex pattern. Do you know which regex may fit?? With strip_tags() you delete only html tags or html tags and the text between them??
"string strip_tags ( string $str [, string $allowable_tags ] )" This function tries to return a string with all NUL bytes, HTML and PHP tags stripped from a given str. The above example will output:
$subject =<<<AAA notmatching <!-- message --> and this is matching<!-- / message --> notmatching AAA; if (preg_match('%<!-- message -->(.+)<!-- / message -->%si', $subject, $regs)) { $result = "This is your captured text: {$regs[1]}"; } else { $result = "does not match"; } echo "$result\n"; PHP: Regards, flexdex