Hi, I have been trying to come up with a expression that will grab the content of a html page, but only from opening body tag to its closing counterpart. I would really appreciate feedback on this, as it is about my fifth attempt. I have testing them all evening on different pages, to check the reliablity, this one works on all pages I have tested, but because some of my earlier attempts worked for some pages and not others I am a bit unsure, so here it is preg_match_all("/<\s*(body)((\s*(.)*[^a-zA-z0-9]\s*)*)(\/)(body)\s*>/i", $file, $matches); Code (markup): I am using this with a site editor module I am building, I would like to be able to gather the current site layout with php before sending it through to the javascripts as I have other database actions that could be handled better this way, but I do have a javascript workaround that is not as pretty if this becomes to risky. Advice please!
You do need to avoid regex as much as possible , you dont need your server killed especially if you have hi traffic one. There are much simpler and faster ways to achieve what you need Try loading a flle into simplexml and get body libxml_use_internal_errors( true ); //file_put_contents( 'data.x', file_get_contents("http://www.php.net/mysql_connect") ); $x = simplexml_load_file("data.x"); echo $x->body->asXml(); //or echo $x->BODY->asXml(); PHP:
thanks i am going to look into this, hadnt actually heard of this construct(?), sounds pretty awesome! the site editor that i am working on (about third draft - trying to make it real lean) is not for public access, it is a back-end module for clients (who don't code) to setup there site, well at least that is the initial idea. my first two working drafts work off of a predefined template and are a bit chunky, but now I am experimenting with anonymous templates to see if it can actually become a plugin, and drawing from what I did originally. thanks!
Yeah, regexp are pretty hard on the server. There are quite a few DOM libraries for PHP, i advise using them.
i had a quick fiddle with the simplexml, it is very cool, but my problem at the moment is when i try to load html pages that have strange tags, first i had to google to see the way html is loaded, but then i tried to load a page that had a 'nobr' tag and it just didn't want to parse, it seems like a very neat tool, but until i can find a way to filter 'untidy' or unwanted html, I don't really want to use it, as I am trying set up the editor for any anonymous page, that i might not have coded myself edit: i did find an interesting package called html tidy....
thanks alot guys, definitely pointed me in the right direction, i found PHP Simple HTML DOM Parser on the web and have been testing it out, it is pretty impressive and just what I need.