regex- regular expression help

Discussion in 'PHP' started by rishirajsingh, Oct 18, 2007.

  1. #1
    I am using curl to open google search page

    
    $filelocation="http://www.google.com/search?q=cellphone&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a";
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $filelocation);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
    $html=curl_exec ($ch);
    curl_close ($ch);
    
    Code (markup):
    Now I want all the sponsored results to get in php variable from $html, like
    $title[0]="Cell phone" //ad title
    $adurl[0]="http://www.unfoundation.org/vodafone/index.asp"
    // ad url is appended after href=/url?sa or href=/pagead/iclk?sa
    $addescription[0]="Improving telecommunications to help in times of disaster."
    $displayurl[0]="www.UNFoundation.org/vodafone"

    I am not able to parse ad data from html code ( because I can't write regex-regular expression for that)
    I need some kind of help in writing regex to parse ad data from html code.

    I can pay for it.

    P.S. - Google sponsored results are at top or right of natural results.
     
    rishirajsingh, Oct 18, 2007 IP
  2. rishirajsingh

    rishirajsingh Banned

    Messages:
    286
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    0
    #2
    
    <a id=an5 href=/pagead/iclk?sa=l&ai=Bjt-Pnum=8&adurl=http://www.westhost.com/package-compare.html%3FDgoo-gene>
    $3.95 <b>Web Hosting</b></a></font><br>VPS, Huge Disk Space and Bandwidth!<br>
    Fall Special ends soon...<br><span class=a>www.westhost.com</span>
    
    <a id=pa3 href=/url?sa=L&ai=B0MF0&q=http://www.3ix.com/%3Fso onmouseover="return true">
    2GB <b>Web Hosting</b> $1/Rs.40</a><br>
    <font size=-1><span class=a>www.3ix.in</span>
    
    Code (markup):
    I have only above two type of code in my document.
    and I want to extract following data from it.

    Example:
    exact url: http://www.westhost.com/package-compare.html
    Title: $3.95 Web Hosting
    Description : VPS, Huge Disk Space and Bandwidth! Fall Special ends soon...
    Domain: www.westhost.com

    I can make some kinda logic but cant make exact regular expression
    <a id=(an|pa)[0-9] href=/[^&q|&adurl] (&q|&adurl)=$exacturl%[^ ]> $title </a> <span>$Domain </span>$description </font>


    I need regular expression to parse this data from my html code.
    with regular expression I can use preg_match_all to get the data.

    P.S. - For any reference one can refer http://www.google.com/search?hl=en&q=webhosting&btnG=Google+Search
    From here i got the HTML code. Exact url is ended at % sign.

    Thanks for any kind of help.
     
    rishirajsingh, Oct 19, 2007 IP
  3. Barti1987

    Barti1987 Well-Known Member

    Messages:
    2,703
    Likes Received:
    115
    Best Answers:
    0
    Trophy Points:
    185
    #3
    Place (.*) where you want to catch the text and escape all modifiers.

    Peace,
     
    Barti1987, Oct 19, 2007 IP