1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

Parser

Discussion in 'PHP' started by ssimon171078, Feb 27, 2015.

  1. #1
    i made parser of ebay but i have problem: each line of text that i parsed i receive twice
    my code:
    <?php
    //parser of website ebay domain names
    ini_set('memory_limit','1024M');
    ini_set('max_execution_time',0);
    $website="http://www.ebay.com/sch/Domain-Names-/3767/i.html";
    $filename="ebay_domain_names3.txt";
    $fd=fopen($filename,"a+");
    function parse($Page){
    global $website;
    global $fd;
    if ($Page!=0)
    {$content=file_get_contents($website."?_pgn=".$Page."&_skc=200&rt=nc");echo ($website."?_pgn=".$Page."&_skc=200&rt=nc");}
    else
    {$content=file_get_contents($website);}
    $dom=new DOMDocument();
    $dom->loadhtml($content);
    $links=$dom->getElementsByTagName("a");
    foreach ($links as $link)
    {
        $links_ebay=$link->getAttribute("href");
        if (strpos($links_ebay,"itm")){
        fwrite($fd,$links_ebay);
        fwrite($fd,"\n");}
        }
    
     
    }
    for ($Page=0;$Page<22000;$Page++){
    parse($Page);
    sleep(10);
    }
    
    
    fclose($fd);
    ?>
    PHP:
    my text file:
    http://www.ebay.com/itm/OSRON-COM-For-Sale-PREMIUM-DOMAIN-NAME-Aged-BRANDABLE-3-4-5-Letter-/271784332998?pt=LH_DefaultDomain_0&hash=item3f479bcec6
    http://www.ebay.com/itm/OSRON-COM-For-Sale-PREMIUM-DOMAIN-NAME-Aged-BRANDABLE-3-4-5-Letter-/271784332998?pt=LH_DefaultDomain_0&hash=item3f479bcec6
    http://www.ebay.com/itm/InsulinInhaled-com-FDA-Approved-Breakthrough-Diabetes-No-Injection-Treatment-/181672911427?pt=LH_DefaultDomain_0&hash=item2a4c8ca243
    http://www.ebay.com/itm/InsulinInhaled-com-FDA-Approved-Breakthrough-Diabetes-No-Injection-Treatment-/181672911427?pt=LH_DefaultDomain_0&hash=item2a4c8ca243
     
    ssimon171078, Feb 27, 2015 IP
  2. PDD

    PDD Greenhorn

    Messages:
    67
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    23
    #2
    It's because there's two a tags associated with each domain listing. One for the link and one for the image.
     
    PDD, Feb 27, 2015 IP
  3. PoPSiCLe

    PoPSiCLe Illustrious Member

    Messages:
    4,623
    Likes Received:
    725
    Best Answers:
    152
    Trophy Points:
    470
    #3
    Just change the script to do a comparison between the current link and the former - if they're the same, don't print it.
    Something like this:
    
    foreach ($links as $link)
    {
        $prev_link = '';
        $links_ebay=$link->getAttribute("href");
        if (strpos($links_ebay,"itm") && $links_ebay != $prev_link){
        fwrite($fd,$links_ebay);
        fwrite($fd,"\n");
       $prev_link = $links_ebay;
       }
    }
    
    PHP:
     
    PoPSiCLe, Feb 27, 2015 IP
  4. PDD

    PDD Greenhorn

    Messages:
    67
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    23
    #4
    that code would reset $prev_link every iteration :p. i think a cleaner solution is to just check for the img class attribute for the <a> tag using his DOM parser.
     
    PDD, Feb 27, 2015 IP
  5. PoPSiCLe

    PoPSiCLe Illustrious Member

    Messages:
    4,623
    Likes Received:
    725
    Best Answers:
    152
    Trophy Points:
    470
    #5
    Ops - the $prev_link = ''; should of course be OUTSIDE the foreach-function. Shit happens when you type on a mobile keyboard :D
     
    PoPSiCLe, Feb 27, 2015 IP
  6. ssimon171078

    ssimon171078 Well-Known Member

    Messages:
    276
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    103
    #6
    i have small question i have html code:
    <li class="lvprice prc">
                <span  class="bold">
                        <b>ILS</b> 597.46</span>
                    </li>
    HTML:
    how can i receive 597.46 when i want to use PHP i think to use DOMXPath ?
     
    ssimon171078, Feb 28, 2015 IP
  7. PDD

    PDD Greenhorn

    Messages:
    67
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    23
    #7
    in pseudocode
    doc('span[class=bold]')->innerText
    Code (php):
    or
    $span = doc('span[class=bold]');
    $span->children('b')->remove();
    $price = $span->innerText;
    Code (php):
     
    PDD, Feb 28, 2015 IP
  8. EricBruggema

    EricBruggema Well-Known Member

    Messages:
    1,740
    Likes Received:
    28
    Best Answers:
    13
    Trophy Points:
    175
    #8
    Why not use a temporary array to store the found URL, before storing to the file check if it exists in the array if not, add and write, if it does, ignore and go to the next... it ain't that hard...
     
    EricBruggema, Mar 3, 2015 IP