Hi all, Im here to ask for help in making a PHP basic web crawler. ive tried several avavilable online, but none that meets up to my needs. I am attempting to write my own, but my php skills are EXTREMLEY limited. What i need is a a basic search box on my site, where user can type in the name of a domain, and get crawling results only for that domain, no hyper links etc. i would like the results to be sshown on the same page but in a code box below so the user can copy paste effectivley. I would be very thankful for any help or pointers anyone is willing to give. Thanks in advance IR
Web crawlers are somewhat complicated, so it may be difficult to write one from scratch if you think that your PHP skills are extremely limited. Perhaps you should improve/modify an existing open source crawler, so it will suit your needs? When you get the crawler working, searching for results in one domain should be easy. Just have an SQL table with domains and domain IDs, put the domain ID in another table for pages crawled, and then you can search for results in one domain only with something like "SELECT * FROM pages WHERE domain_id = 1096".
Initial start up is find some free script and then you can proceed on it. Some sites are providing list of domains registered by date, you can grab those and crawl on it.
thanks for the tips guys, so, sofar after trawling this forum among others i found ithink what im looking for it seem to do the job sort of, the only thing is the output i have is bad, i would like it to be each link on each line, but this one makes it all bunched up. any ideas? <?php $saving = $_REQUEST['saving']; if ($saving == 1) { $data = $_POST['data']; $file = "urls.txt"; $fp = fopen($file, "w") or die("Couldn't open $file for writing!"); fwrite($fp, $data) or die("Couldn't write values to file!"); fclose($fp); echo "Saved to $file successfully!"; } ?> <form name="form1" method="post" action="form1.php?saving=1"> <textarea name="data" cols="100" rows="10"> <?php $file = "urls.txt"; if (!empty($file)) { $file = file_get_contents("$file"); echo $file; } ?> <?php if (!empty($file)) { $file = file_get_contents("$file"); echo $file; } asort($int_pages); foreach ($int_pages as $i => $x) $int_pages[$i] = "" . htmlentities($x) . "" . ""; echo implode('', $int_pages); ?> </textarea> <br> <input type="submit" value="Save"> </form> <?php Code (markup): Thanks Again Inneed
Hi inneed, I don't mean to discourage you, but so far all of your posts regarding PHP web crawlers, have suggested to me that your level of programming ability is probably insufficient to complete even the least capable and most simplistic web crawler / search engine.