perl command line pr checker

Discussion in 'Programming' started by Shoemoney, Aug 15, 2005.

  1. #1
    so i wrote this a while ago i find it very useful when I do batch checks on domains for pr (more then 5000 at a time) maybe someone will find it useful. just ./pr.pl (domain)

    
    #!/usr/bin/perl
    use LWP::Simple;
    use URI::Escape;
    use integer;
    
    $rank = getrank("$ARGV[0]");
    print "Pagerank for $ARGV[0]: $rank\n";
    sub zeroFill {
       my $z = hex(80000000);
       $a = $_[0];
       $b = $_[1];
    
       if ($z & $a) {
          $a = ($a >> 1);
          $a &= (~$z);
          $a |= 0x40000000;
          $a = ($a>>($b-1));
       } else {
          $a = ($a>>$b);
       }            
       return $a;
    }
    
    sub mix {
       my $a = $_[0];
       my $b = $_[1];
       my $c = $_[2];
       $a -= $b; $a -= $c; $a ^= (zeroFill($c,13));
       $b -= $c; $b -= $a; $b ^= ($a<<8);
       $c -= $a; $c -= $b; $c ^= (zeroFill($b,13));
       $a -= $b; $a -= $c; $a ^= (zeroFill($c,12));
       $b -= $c; $b -= $a; $b ^= ($a<<16);
       $c -= $a; $c -= $b; $c ^= (zeroFill($b,5));
       $a -= $b; $a -= $c; $a ^= (zeroFill($c,3));   
       $b -= $c; $b -= $a; $b ^= ($a<<10);
       $c -= $a; $c -= $b; $c ^= (zeroFill($b,15));
    
       @_ = ($a,$b,$c); 
       return @_;
    }
    sub GoogleCH {
       my @url = @_;
       my $length = 0;
       my $init = 0xE6359A60;
       
       if($length == 0) {
          $length = scalar @_;
       }
    
       my $a = my $b = 0x9E3779B9;
       my $c = $init;
       my $k = 0;
       my $len = $length;   
       while($len >= 12) {
          $a += ($url[$k+0] +($url[$k+1]<<8) +($url[$k+2]<<16) +($url[$k+3]<<24));
          $b += ($url[$k+4] +($url[$k+5]<<8) +($url[$k+6]<<16) +($url[$k+7]<<24));
          $c += ($url[$k+8] +($url[$k+9]<<8) +($url[$k+10]<<16)+($url[$k+11]<<24));     
          @_ = mix($a,$b,$c);
          $a = $_[0]; $b = $_[1]; $c = $_[2];
          $k += 12;
          $len -= 12;
       }
       
       $c += $length;
    
       while($len <= 11 && $len > 0) {
          if($len == 11) { $c+=($url[$k+10]<<24) }
          if($len == 10) { $c+=($url[$k+9]<<16) }
          if($len == 9) { $c+=($url[$k+8]<<8) }
          if($len == 8) { $b+=($url[$k+7]<<24) }
          if($len == 7) { $b+=($url[$k+6]<<16) }
          if($len == 6) { $b+=($url[$k+5]<<8) }
          if($len == 5) { $b+=($url[$k+4]) }
          if($len == 4) { $a+=($url[$k+3]<<24) }
          if($len == 3) { $a+=($url[$k+2]<<16) }
          if($len == 2) { $a+=($url[$k+1]<<8) }
          if($len == 1) { $a+=($url[$k+0]) }
          $len--;
       }
      
       @_ = mix($a,$b,$c);
       return $_[2];
    }
    
    
    sub strord {
       my @result;
       my $string = $_[0];
    
       my @char = split //, $string;
    
       for(my $i=0; $i<length($string); $i++) {
          $result[$i] = ord($char[$i]);
       }
       return @result;
    }
    
    sub getrank {
       my $url = $_[0];
       $url = 'info:'.$url;
    
       my $ch = GoogleCH(strord($url));
    
       my $file = get("http://www.google.com/search?client=navclient-auto&ch=6$ch&features=Rank&q=$url");
       my @rankarray = split(/:|\n/, $file);
       return $rankarray[4];  
    }
    
    sub getPr {
       my $getUrl = $address_widget->get_text;
       my $settext = $pgrankapp->get_widget('rankvalue')->set_text(getrank($getUrl));
       
    }
    
    Code (markup):

     
    Shoemoney, Aug 15, 2005 IP
    Willy likes this.
  2. daed

    daed Peon

    Messages:
    93
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #2
    I can't get the URL in the file to function, and accessing it says my client is forbidden. I wonder if they caught on and closed it down?
     
    daed, Aug 16, 2005 IP
  3. forkqueue

    forkqueue Guest

    Messages:
    401
    Likes Received:
    21
    Best Answers:
    0
    Trophy Points:
    0
    #3
    Works very nicely for me, thanks Shoemoney!

    Nice clean Perl code too - like I always tell people, when Perl is well written it's perfectly readable. Bad code is unreadable whatever the language.
     
    forkqueue, Aug 16, 2005 IP
  4. daed

    daed Peon

    Messages:
    93
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #4
    How are you getting it to work? I get forbidden from just about any IP I try, with a link to the following paragraph:

    No Automated Querying

    You may not send automated queries of any sort to Google's system without express permission in advance from Google. Note that "sending automated queries" includes, among other things:

    * using any software which sends queries to Google to determine how a website or webpage "ranks" on Google for various queries;
    * "meta-searching" Google; and
    * performing "offline" searches on Google.
     
    daed, Aug 16, 2005 IP
  5. Shoemoney

    Shoemoney $

    Messages:
    4,474
    Likes Received:
    588
    Best Answers:
    0
    Trophy Points:
    295
    #5
    daed, it works for me. you == sux ;)

    ..._
    ../\)
    ./ / ~.
    ( Y) [root@devbox(~)]: ./pr.pl digitalpoint.com
    Pagerank for digitalpoint.com: 7
     
    Shoemoney, Aug 16, 2005 IP
  6. Willy

    Willy Peon

    Messages:
    281
    Likes Received:
    25
    Best Answers:
    0
    Trophy Points:
    0
    #6
    Whoa, lots of bitshifts going on there ;) ...gonna have to take a closer look at a better time, but looks interesting, thanks! Is there any documentation/sample code you based it on?
     
    Willy, Aug 16, 2005 IP
  7. Shoemoney

    Shoemoney $

    Messages:
    4,474
    Likes Received:
    588
    Best Answers:
    0
    Trophy Points:
    295
    #7
    ummm nope use it or dont ;)

    I suck at programing. I am sure its unsecure and unstable. I wrote it in a few mins. I just thought I would share what i use.
     
    Shoemoney, Aug 16, 2005 IP
  8. Willy

    Willy Peon

    Messages:
    281
    Likes Received:
    25
    Best Answers:
    0
    Trophy Points:
    0
    #8
    You misunderstood me :) It just wasn't immediately obvious to me what the GoogleCH sub does, so I was wondering whether that's some special key algorithm that might be published in some Google paper somewhere... So if anything, that's only a reflection on my ability to (mis)understand Perl :D

    Inquiring minds want to know ;)
     
    Willy, Aug 16, 2005 IP
  9. Shoemoney

    Shoemoney $

    Messages:
    4,474
    Likes Received:
    588
    Best Answers:
    0
    Trophy Points:
    295
    #9
    no no im saying I really suck.. seriously... I got ideas from the perl cookbook then searched the net to compile that code above. I use it constantly but its possible its unstable or insecure. I just wanted to state that for the record.
     
    Shoemoney, Aug 16, 2005 IP
  10. daed

    daed Peon

    Messages:
    93
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #10
    Yeah.. my copy of perl sucks. =[ I can get it to work on other ones. What version of perl are you running?
     
    daed, Aug 17, 2005 IP
  11. Shoemoney

    Shoemoney $

    Messages:
    4,474
    Likes Received:
    588
    Best Answers:
    0
    Trophy Points:
    295
    #11
    whatever is debian stable ;)

    root@devbox:/home/shamu/public_html# perl -v

    This is perl, v5.8.4 built for i386-linux-thread-multi
     
    Shoemoney, Aug 17, 2005 IP
  12. script909

    script909 Peon

    Messages:
    33
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #12
    does it recuires google-licence?
     
    script909, Aug 18, 2005 IP
  13. Shoemoney

    Shoemoney $

    Messages:
    4,474
    Likes Received:
    588
    Best Answers:
    0
    Trophy Points:
    295
    #13
    no it does not
     
    Shoemoney, Aug 18, 2005 IP
  14. script909

    script909 Peon

    Messages:
    33
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #14
    pretty thing
    respect
     
    script909, Aug 18, 2005 IP
  15. Shoemoney

    Shoemoney $

    Messages:
    4,474
    Likes Received:
    588
    Best Answers:
    0
    Trophy Points:
    295
    #15
    thanks hope you find it useful
     
    Shoemoney, Aug 18, 2005 IP
  16. flash_f

    flash_f Peon

    Messages:
    21
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #16
    wow, itss nice and cool
     
    flash_f, Dec 9, 2005 IP
  17. jbw

    jbw Peon

    Messages:
    343
    Likes Received:
    12
    Best Answers:
    0
    Trophy Points:
    0
    #17
    Old thead it seems, but for the person asking, that bitshiffting and such looks like the Burtle Hash.
     
    jbw, Dec 10, 2005 IP
  18. Will.Spencer

    Will.Spencer NetBuilder

    Messages:
    14,789
    Likes Received:
    1,040
    Best Answers:
    0
    Trophy Points:
    375
    #18
    Yep, Google appears to have copied Bob Jenkins' code.
     
    Will.Spencer, Jan 10, 2006 IP
  19. JRBHosting

    JRBHosting Peon

    Messages:
    121
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #19
    You know, I hate bringing up dead topics, but this one has a legitimate purpose :)

    Just for everybody's information, this code as of this post is NOT working. I presume this is due to Google's checksum, etc. changes.

    If anybody could change this code so that it works...that would be great. If not, its not as if I lost anything :p

    Jason
     
    JRBHosting, Mar 24, 2007 IP
  20. sea otter

    sea otter Peon

    Messages:
    250
    Likes Received:
    23
    Best Answers:
    0
    Trophy Points:
    0
    #20
    The CPAN PageRank module works fine; just tested it (again) two minutes ago.

    The module is WWW::Google::pageRank

    Just install it from within CPAN.

    Here's a quick sample script based on it to check PR (save it to a file called pr.pl, and make sure you chmod a+x it before running):

    
    #!/usr/bin/perl
    
    use locale;
    use WWW::Google::PageRank;
    
    
    my $pr = WWW::Google::PageRank->new();
    
    my $rank = $pr->get(shift);
    
    print "PageRank: " . $rank . "\n";
    
    Code (markup):
    For example:

    
    ./pr.pl http://digitalpoint.com
    
    Code (markup):
    You should of course modify the above to verify/sanitize the input, add missing http:// etc.

    As an aside, there are known problems with the freely available PHP pagerank scripts, related to some esoteric problems with the bitshift operator and certain versions of gcc. I now always use a perl PR checker based on the above CPAN module, which I call from within PHP.

    This setup has been working for years, never a problem.
     
    sea otter, Mar 24, 2007 IP
    Will.Spencer likes this.