Buying 15$ for whoever can fix this simple PERL scraping script

Discussion in 'Programming' started by smetten, Aug 9, 2011.

  1. #1
    I have a PERL script that needs some minor changes to scrape .us domains from
    http://newlydomains.com/domain-2011-08-03-us-1.html

    currently it doesn't work because there is something wrong with it.
    15$ for whoever can fix it.

    #!/usr/bin/perl
    use strict;
    use LWP::Simple;
    
    # Change the name of the output file
    my $filename = "domainlist.txt";
    # Change the name of the URL to fetch
    my @urls = ( "http://newlydomains.com/domain-2011-08-03-us-1.html"
    
    
    
     );
    # Change this to 0 if you want files to be overwritten
    my $overwrite = 1;
    print "Content-type: text/html\n\n";
    
      if($overwrite eq 1)
        {
        if(-e $filename)
          {
          $filename= $filename.'.'.time;
          }
        }
    
    open (OUTFILE, ">", $filename) or die ("Cannot open file ".$filename."\n");
    
    foreach my $url (@urls)
      {
      my $page = get($url);
      my(@results) = (join '\n', $page) =~  m|"top">(.+?)\.us</td>|gi;
      if(scalar(@results) gt 0)
        {
        foreach my $result(@results)
          {
          print OUTFILE "$result.us\n";
          }
        }
      else
        {
        print "No \".us\" domains found at $url\n";
        }
      }
      close OUTFILE;
      print "Results saved to <a href=\"$filename\">$filename</a>\n";  
    PHP:

     
    smetten, Aug 9, 2011 IP
  2. austin-G

    austin-G Peon

    Messages:
    435
    Likes Received:
    7
    Best Answers:
    0
    Trophy Points:
    0
    As Seller:
    100% - 0
    As Buyer:
    100% - 0
    #2
    Hi there,

    What happens when it is run?
     
    austin-G, Aug 9, 2011 IP
  3. smetten

    smetten Active Member

    Messages:
    269
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    58
    As Seller:
    100% - 0
    As Buyer:
    100% - 0
    #3
    it should create a text file called domainslist.txt which contains all the .us domains on that page.

    but it just creates an empty domainslist.txt file.

    this part isn't correct: my(@results) = (join '\n', $page) =~ m|"top">(.+?)\.us</td>|gi;

    Greetz

    Smetten
     
    smetten, Aug 9, 2011 IP
  4. kishore415

    kishore415 Well-Known Member

    Messages:
    1,462
    Likes Received:
    41
    Best Answers:
    0
    Trophy Points:
    160
    As Seller:
    100% - 0
    As Buyer:
    25.0% - 1
  5. smetten

    smetten Active Member

    Messages:
    269
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    58
    As Seller:
    100% - 0
    As Buyer:
    100% - 0
    #5
    No need for it anymore, someone did it already.

    Thx anyway
     
    smetten, Aug 9, 2011 IP