Hello I'm here looking for advice on a script i'm trying to write. My current script looks through a .txt list of urls for certain anchor text using ::Mechanize. I need the script to also look in the sub directories/pages of the urls for the anchor text. I have no idea how to achieve this, I have looked for answers but all the answers I have found are beyond my current perl knowledge. Another big problem I have is the time it takes to run the script as it is checking at least 1000 urls. I would highly appreciate any help or advice in regards to my problem, I can supply my script here if needed. Thank you in advance
You want a script that will follow links on the page or what do you mean by sub directories/pages? Try to use threads
It reads from a .txt file containing a list of urls and then just checks tat url for the anchor if it matches the regex in the anchor it just prints the url saying if it has matched the anchor text or not into a .txt file
So you're doing something like this? use strict; use warnings; use LWP::UserAgent; use threads; use threads::shared; my $threads = 1; my $url_list = 'urls.txt'; my $checked_list = 'checked.txt'; my $anchor_text = '>Browse</a>'; open(F, "<$url_list") || die $!; chomp(my @urls = <F>); close F; $| = 1; my @trl = (); my $ua = new LWP::UserAgent; $trl[$_] = threads->create(\&main) for 0..$threads - 1; $_->join for @trl; sub main { while(@urls) { my $url = shift @urls; my $page = $ua->get($url)->content; if($page =~ /$anchor_text/o) { print "[+] $url\n"; open(F, ">>$checked_list") || die $!; flock F, 2; print F $url, "\n"; flock F, 4; close F; } else { print "[-] $url\n"; } } } Code (markup): or I misunderstood?
Yes, and if I changed my $anchor_text = '>Browse</a>'; to my $anchor_text = <>; this would then allow me to specify the anchor text before the script runs?