Hi there, This is my code: <?php $id = $_POST["id"]; $url = 'http://www.trademe.co.nz/Browse/Listing.aspx?id=' . $id; $page = file_get_contents($url); preg_match('/<h1 id="ListingTitle_title">(.*?)<\/h1>/s', $page, $pre_title); $title = trim(html_entity_decode($pre_title[1])); echo $title; ?> PHP: As you can see it goes to http://www.trademe.co.nz/Browse/Listing.aspx?id=xxxxxx and brings back the content between <h1 id="ListingTitle_title"> and </h1> - this is fine, however on this item for example (http://www.trademe.co.nz/Browse/Listing.aspx?id=230310050) it has an image that says NEW inbetween the tags, so my script brings that back also. What I am trying to achieve is to make the script ignore that logo. I want to make the script ignore: <img src="/images/NewSearchCards/LVIcons/brandNewItem.gif" id="ListingTitle_brandNewIcon" border="0" alt="Brand new item" title="Brand new item" width="26" height="15" /> HTML: Any help would be appreciated. Thanks
If i get the problem right, in your $pre_title[1], there will be raw html code of everything between <h1> tags. If this is the case, you can strip <img> tag in $pre_title[1], by using strip_tags($pre_title[1], "<h1>"); (allowed tag just for extra care). So the solution wouldn't be ignore, but we will delete the unwanted part of the string. And since <img> is single tag, there will be no trace of content.
Sorry, but I'm new to PHP. What would I need to change in preg_match('/<h1 id="ListingTitle_title">(.*?)<\/h1>/s', $page, $pre_title); $title = trim(html_entity_decode($pre_title[1])); I'm looking for exact code to work with?