Perl help

Discussion in 'Programming' started by Fisix, Jun 22, 2010.

  1. #1
    Hello

    I'm here looking for advice on a script i'm trying to write.
    My current script looks through a .txt list of urls for certain anchor text using ::Mechanize. I need the script to also look in the sub directories/pages of the urls for the anchor text.
    I have no idea how to achieve this, I have looked for answers but all the answers I have found are beyond my current perl knowledge.
    Another big problem I have is the time it takes to run the script as it is checking at least 1000 urls.

    I would highly appreciate any help or advice in regards to my problem, I can supply my script here if needed.

    Thank you in advance
     
    Fisix, Jun 22, 2010 IP
  2. Kaimi

    Kaimi Peon

    Messages:
    60
    Likes Received:
    5
    Best Answers:
    0
    Trophy Points:
    0
    #2
    You want a script that will follow links on the page or what do you mean by sub directories/pages?

    Try to use threads
     
    Kaimi, Jun 22, 2010 IP
  3. Kaimi

    Kaimi Peon

    Messages:
    60
    Likes Received:
    5
    Best Answers:
    0
    Trophy Points:
    0
    #3
    Can you provide small example of input and output data that you want to be generated by script?
     
    Kaimi, Jun 22, 2010 IP
  4. Fisix

    Fisix Peon

    Messages:
    16
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #4
    Can I pm you the script as it is so far ?
     
    Fisix, Jun 22, 2010 IP
  5. Fisix

    Fisix Peon

    Messages:
    16
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #5
    It reads from a .txt file containing a list of urls and then just checks tat url for the anchor if it matches the regex in the anchor it just prints the url saying if it has matched the anchor text or not into a .txt file
     
    Fisix, Jun 22, 2010 IP
  6. Kaimi

    Kaimi Peon

    Messages:
    60
    Likes Received:
    5
    Best Answers:
    0
    Trophy Points:
    0
    #6
    So you're doing something like this?
    
    use strict;
    use warnings;
    use LWP::UserAgent;
    use threads;
    use threads::shared;
    
    my $threads = 1;
    my $url_list = 'urls.txt';
    my $checked_list = 'checked.txt';
    
    my $anchor_text = '>Browse</a>';
    
    open(F, "<$url_list") || die $!;
    chomp(my @urls = <F>);
    close F;
    
    $| = 1;
    my @trl = ();
    my $ua = new LWP::UserAgent;
    
    $trl[$_] = threads->create(\&main) for 0..$threads - 1;
    $_->join for @trl;
    
    sub main
    {
    	while(@urls)
    	{
    		my $url = shift @urls;
    		my $page = $ua->get($url)->content;
    
    		if($page =~ /$anchor_text/o)
    		{
    			print "[+] $url\n";
    			open(F, ">>$checked_list") || die $!;
    			flock F, 2;
    			print F $url, "\n";
    			flock F, 4;
    			close F;
    		}
    		else
    		{
    			print "[-] $url\n";
    		}
    	}
    }
    
    Code (markup):
    or I misunderstood?
     
    Kaimi, Jun 22, 2010 IP
  7. Fisix

    Fisix Peon

    Messages:
    16
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #7
    Yes, and if I changed my $anchor_text = '>Browse</a>'; to my $anchor_text = <>; this would then allow me to specify the anchor text before the script runs?
     
    Fisix, Jun 22, 2010 IP