[Free Class] URL and Email Scrapper

Discussion in 'Programming' started by Barti1987, Sep 22, 2008.

  1. #1
    URL and Email Scrapper
    Class features:
    • Scrape Emails and URLs
    • Limit scrapping to certain domain names.
    • Limit scrapping to certain file extensions.
    • Limit scrapping to block certain domain names.
    • Limit scrapping by the number of pages.
    • No duplicated pages scraped.
    Sample Implementation

    <?php

    /*
    scrapper class implementation of scrapper.class.php
    */

    /* Include scrapper class */
    include('scrapper.class.php');

    /* Start a new scrapper object */
    $do = new scraperStart;

    /* Set maximum pages to scrape */
    $do->setOptions(50);

    /*
    Set file locations and separators
    setFile(emailFileLocation,urlFileLocation,separator)
    */
    $do->setFile('emails.txt','urls.txt',"\n");

    /*
    Only do certain extensions
    */
    $do->doOnly('htm');
    $do->doOnly('html');
    $do->doOnly('php');
    $do->doOnly('asp');
    $do->doOnly('jsp');

    /*
    Only do certain domains
    */
    $do->onlyDomain('forums.digitalpoint.com');
    $do->onlyDomain('google.com');

    /*
    Exclude the following domains
    */
    $do->excludeDomain('yahoo.com');
    $do->excludeDomain('ask.com');

    /*
    Start scrapping at this URL
    */
    $do->startScrape('http://forums.digitalpoint.com/forumdisplay.php?f=24');

    /*
    Now store the begotten information into the files
    */
    $do->storeList();

    ?>


    Download

    Peace,
     
    Barti1987, Sep 22, 2008 IP