PHP Web crawler

Discussion in 'PHP' started by inneed, Nov 28, 2010.

  1. #1
    Hi all,

    Im here to ask for help in making a PHP basic web crawler.
    ive tried several avavilable online, but none that meets up to my needs. I am attempting to write my own, but my php skills are EXTREMLEY limited.

    What i need is a a basic search box on my site, where user can type in the name of a domain, and get crawling results only for that domain, no hyper links etc.
    i would like the results to be sshown on the same page but in a code box below so the user can copy paste effectivley.

    I would be very thankful for any help or pointers anyone is willing to give.

    Thanks in advance

    IR
     
    inneed, Nov 28, 2010 IP
  2. underground-stockholm

    underground-stockholm Guest

    Messages:
    53
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #2
    Web crawlers are somewhat complicated, so it may be difficult to write one from scratch if you think that your PHP skills are extremely limited. Perhaps you should improve/modify an existing open source crawler, so it will suit your needs?

    When you get the crawler working, searching for results in one domain should be easy. Just have an SQL table with domains and domain IDs, put the domain ID in another table for pages crawled, and then you can search for results in one domain only with something like "SELECT * FROM pages WHERE domain_id = 1096".
     
    underground-stockholm, Nov 28, 2010 IP
  3. senth

    senth Peon

    Messages:
    53
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #3
    Initial start up is find some free script and then you can proceed on it.

    Some sites are providing list of domains registered by date, you can grab those and crawl on it.
     
    senth, Nov 28, 2010 IP
  4. inneed

    inneed Peon

    Messages:
    2
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #4
    thanks for the tips guys, so, sofar after trawling this forum among others i found ithink what im looking for :) it seem to do the job sort of, the only thing is the output i have is bad, i would like it to be each link on each line, but this one makes it all bunched up.

    any ideas?


    
    <?php
      $saving = $_REQUEST['saving'];
      if ($saving == 1) { 
        $data = $_POST['data'];
    	$file = "urls.txt"; 
     
        $fp = fopen($file, "w") or die("Couldn't open $file for writing!"); 
        fwrite($fp, $data) or die("Couldn't write values to file!"); 
     
        fclose($fp); 
        echo "Saved to $file successfully!";
      }
    ?>
    
    <form name="form1" method="post" action="form1.php?saving=1">
      <textarea name="data" cols="100" rows="10">
      <?php
        $file = "urls.txt";
        if (!empty($file)) {  
    	  $file = file_get_contents("$file");
    	  echo $file;  
    	}  
      ?>
    
    <?php
    if (!empty($file)) {  
    	  $file = file_get_contents("$file");
    	  echo $file;  
    	}  
    
    asort($int_pages);
    foreach ($int_pages as $i => $x)
      $int_pages[$i] = 
                       "" . htmlentities($x) . "" .
                       "";
    echo implode('', $int_pages);
    
    ?>
      </textarea>
      <br>
      <input type="submit" value="Save">
    </form>
    <?php
    Code (markup):
    Thanks Again

    Inneed
     
    inneed, Nov 29, 2010 IP
  5. deepakg

    deepakg Peon

    Messages:
    48
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #5
    Hi inneed, I don't mean to discourage you, but so far all of your posts regarding PHP web crawlers, have suggested to me that your level of programming ability is probably insufficient to complete even the least capable and most simplistic web crawler / search engine.
     
    deepakg, Dec 2, 2010 IP
  6. w47w47

    w47w47 Peon

    Messages:
    255
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #6
    what should it crawl from that domain?
     
    w47w47, Dec 5, 2010 IP