I am wanting something, so it searches google, but instead of showing the links and texts, it jsut shows links, and had more results per page - like 1000 example google normally: example.com The best place on the net for examples example2.com a very interesting site on examples but what the software does is this, search on google you jsut get a laod of urls without the text: example.com example2.com example3.com and can show like 1000 results per page maybe this can be done another way other than software, eg a script, either way i would think this is simple to implemnet... thanks
I've used code to do this for years, using PHP. Here is the code I've been using and has been working great. $SearchTerm = rawurlencode(trim($_GET["search"])); $cURL = new cURL(); $strHTML = $cURL->get('google.com/cse?cx=013269018370076798483:gg7jrrhpsy4&cof=FORID:1&q=' . $SearchTerm . '&sa=Search'); //$chunks = spliti("</span></td></tr></table>", $strHTML); preg_match_all('%<div class=g><h2 class=r><a href="(.+?)" class=. onmousedown=".+?">(.+?)</a>%', $strHTML, $matches, PREG_SET_ORDER); foreach ($matches as $val) { $pos = strpos($siteCheck, stripURL($val[1])); if($pos === false){ echo $val[1]; } } PHP: I've modified the code from what I use, and hopefully it still works. This script is used in an ajax framework, hence the lack of style. . . Another problem is that Google likes to change up their results output- IE a couple months ago they added quotes to their class instead of just using the one character. This just means you will have to updated the regex.
I can code this one in PHP. One of my previous customer asked this one. Using my script u can harvest link from google at the rate of 50 urls pre second. Also my script can harvest links from multiple pages of the google. PM me if interested !!
I would suggest you to use perl script to get all the data then parse the data using regular expression . Its easy in perl regex
I've done this very quickly with Perl, so if you are able to code in that (and have some knowledge of regular expressions) I would recommend that. You could use a hash to store all of the results (you can even use the URL itself as the key so that you avoid duplication). ** Just did a quick search and pre-written scripts that do this are readily available on the web. Do a search for "Perl script for harvesting URLs" or else "perl webcrawler" or similar to find one.
ok thanks guys, but ive found out the max results from google is only 1000, i want unlimited so im currently trying to find a search engine that will let me search unlimtied pages/results , and then ill see if someone can make a parser for that search engine, will let you all know, dont know how im gonna find such an engine - the reason behind all this is a want ahuge list of blogs, and i can only get a list of 1000 on google, and if google wasnt mean and let me search all oits apges - i would get 1 to 2 million blogs bit of a difference,so im trying everything i can