email parsing?

Discussion in 'PHP' started by vetrivel, Jul 27, 2009.

  1. #1
    Hi ,
    I would like to parse the email and store it into the database by using PHP.
    How to do it?
    What are the difficulties we need to face ?
    plz guide me.

    Any help is appreciated.
     
    vetrivel, Jul 27, 2009 IP
  2. Chikey.ru

    Chikey.ru Peon

    Messages:
    50
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #2
    get content of page , exec regular expression and query to database
     
    Chikey.ru, Jul 27, 2009 IP
  3. vetrivel

    vetrivel Peon

    Messages:
    147
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #3
    Hi,
    I need to parse the email and not the page content.
    I need to copy all my emails to the database .
    ie when i get a new email then script will be executed and copy that email stored in a database.




     
    vetrivel, Jul 27, 2009 IP
  4. Chikey.ru

    Chikey.ru Peon

    Messages:
    50
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #4
    $20 and i help u
     
    Chikey.ru, Jul 27, 2009 IP
  5. vetrivel

    vetrivel Peon

    Messages:
    147
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #5
    Is this that much difficult to do?
    because i know php ,js,mysql and
    i just need to know is startup and not the entire task.


     
    vetrivel, Jul 27, 2009 IP
  6. Chikey.ru

    Chikey.ru Peon

    Messages:
    50
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #6
    if(preg_match("#^[a-zA-Z0-9-._]+@[a-zA-Z0-9-._]+$#",$email,$a)){
    mysql_query("insert into emails set email='$email'");
    }


    because i know php ,js,mysql and
    --
    really?))
     
    Chikey.ru, Jul 27, 2009 IP
  7. dimitar christoff

    dimitar christoff Active Member

    Messages:
    882
    Likes Received:
    62
    Best Answers:
    0
    Trophy Points:
    90
    #7
    dimitar christoff, Jul 27, 2009 IP
  8. Chikey.ru

    Chikey.ru Peon

    Messages:
    50
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #8
    dimitar christoff, hope that u understood that I joked about $20)
    this is VERY simple task, and TC need not be to create this thread=)

    ^[^@]+@[-a-z0-9.]+$
    in first path can not be used all symbols without @. For example привет@chikey.ru - will be allow from script, but really can not exist.
     
    Chikey.ru, Jul 27, 2009 IP
  9. dimitar christoff

    dimitar christoff Active Member

    Messages:
    882
    Likes Received:
    62
    Best Answers:
    0
    Trophy Points:
    90
    #9
    yeah i guess, only data cleaning of email lists is no laughing matter - and a lot of marketing companies pay good money for decent data washing.

    here is an extended PHP check i wrote - not only does it run through the list by applying the regex, it then checks if the domain exists and then if it has mx records. use it to verify mail lists and verify subscriptions - just never occurred to me to check for Cyrillic letters. you are right, it accepts it as a valid email - i don't see why it can't be valid though...

    http://fragged.org/dev/emailPHPtest.php (source supplied)

    "Abc\@def"@google.com is valid
    "Fred Bloggs"@google.com is valid
    "Joe\Blow"@google.com is valid
    "Abc@def"@digitalpoint.com is valid
    customer/department=shipping@shopping.com is valid
    $A12345@google.com is valid
    !def!xyz%abc@yahoo.com is valid
    _somename@gmail.com is valid
    asdasd @sdasdasd failed: Sorry, this is not a valid email
    foo@barbarbar.com failed: barbarbar.com has no valid MX
    foo@barbarbarasdasd.com failed: barbarbarasdasd.com has no DNS records at all
    привет@chikey.ru is valid

    function checkEmail($email) {
        // checks if an email is composed correctly, if its domain exists and if it has valid MX records.
    
        // the following regex guarantees RFC compatibility with the largest and yet compliant / strict form of an email address.
        // check:
        //  - http://haacked.com/archive/2007/08/21/i-knew-how-to-validate-an-email-address-until-i.aspx
        //  - http://fragged.org/dev/email_test.php - using this in javascript
        $validEmailRegex = '/^(([^<>()[\]\\.,;:\s@\"]+(\.[^<>()[\]\\.,;:\s@\"]+)*)|(\".+\"))@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$/';
    
        // default is valid.
        $check = true;
    
        if (!preg_match($validEmailRegex, $email)) {
            // if not composed correctly, has spaces etc this will trigger first.
            $check = "Sorry, this is not a valid email";
        }
        else {
            // composed ok, now get the domain:
            $parts = preg_split('/\@/', $email);
    
            if (count($parts) > 0) {
                // get domain
                $dom = array_pop($parts);
    
                // check if it has any dns records, it's a fast check
                if (checkdnsrr($dom, "ANY")) {
    
                    // if so, perform the much slower MX check (especially so if the domain looks valid but has no data)
                    getmxrr($dom, $mxhosts);
    
                    // need at least 1 MX priority host
                    if (count($mxhosts) < 1)
                        $check = "$dom has no valid MX";
    
                    // you can now do a socket connection to port 25 of the MX hosts to see if the servers are up
                    // but this is excessive.
                }
                else {
                    $check = "$dom has no DNS records at all";
                }
    
            }
            else {
                // should not really come here if regex above works.
                $check = "Invalid / unknown domain.";
            }
        }
    
        return $check;
    } // end checkEmail
    
    
    PHP:
     
    dimitar christoff, Jul 27, 2009 IP
  10. Chikey.ru

    Chikey.ru Peon

    Messages:
    50
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #10
    $parts = preg_split('/\@/', $email);

    advice to optimize code - using in such cases explode

    $parts = explode('@', $email);

    in general - good code=)
     
    Chikey.ru, Jul 27, 2009 IP
  11. dimitar christoff

    dimitar christoff Active Member

    Messages:
    882
    Likes Received:
    62
    Best Answers:
    0
    Trophy Points:
    90
    #11
    hrm fair point, i guess the performance gain would be worth it if your array has like 50,000 emails to go through :)

    function checkEmail($email) {
        // checks if an email is composed correctly, if its domain exists and if it has valid MX records.
    
        // the following regex guarantees RFC compatibility with the largest and yet compliant / strict form of an email address.
        // check:
        //  - http://haacked.com/archive/2007/08/21/i-knew-how-to-validate-an-email-address-until-i.aspx
        //  - http://fragged.org/dev/email_test.php - using this in javascript
        $validEmailRegex = '/^(([^<>()[\]\\.,;:\s@\"]+(\.[^<>()[\]\\.,;:\s@\"]+)*)|(\".+\"))@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$/';
    
        // default is valid.
        $check = true;
    
        if (!preg_match($validEmailRegex, $email)) {
            // if not composed correctly, has spaces etc this will trigger first.
            $check = "Sorry, this is not a valid email";
        }
        else {
            // composed ok, now get the domain:
            $parts = explode('@', $email);
    
            if (count($parts) > 0) {
                // get domain
                $dom = array_pop($parts);
    
                // check if it has any dns records, it's a fast check
                if (checkdnsrr($dom, "ANY")) {
    
                    // if so, perform the much slower MX check (especially so if the domain looks valid but has no data)
                    getmxrr($dom, $mxhosts);
    
                    // need at least 1 MX priority host
                    if (count($mxhosts) < 1)
                        $check = "$dom has no valid MX";
    
                    // you can now do a socket connection to port 25 of the MX hosts to see if the servers are up
                    // but this is excessive.
                }
                else {
                    $check = "$dom has no DNS records at all";
                }
    
            }
            else {
                // should not really come here if regex above works.
                $check = "Invalid / unknown domain.";
            }
        }
    
        return $check;
    } // end checkEmail
    
    PHP:
    was annoying enough to find you can have more than 1 instance of '@' in the email and still have it as valid - my original assumption was to do a simple split() then use $parts[1] as the domain.

    no idea why i went to preg_split in the end :)
     
    Last edited: Jul 27, 2009
    dimitar christoff, Jul 27, 2009 IP
  12. Chikey.ru

    Chikey.ru Peon

    Messages:
    50
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #12
    Sorry, i not undestand what u want to say even use google.translate=(
    1. i have mistake and use explode wrong
    2. u have mistake?
    3. php have mistake?)
     
    Chikey.ru, Jul 27, 2009 IP
  13. dimitar christoff

    dimitar christoff Active Member

    Messages:
    882
    Likes Received:
    62
    Best Answers:
    0
    Trophy Points:
    90
    #13
    no, i made the mistake. using explode is faster than using preg_split - just as you suggested. thanks :)
     
    dimitar christoff, Jul 27, 2009 IP
  14. Ralle

    Ralle Active Member

    Messages:
    35
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    91
    #14
    Some of this is wrong.
    Fix your regex! :p
    http://email.about.com/cs/standards/a/email_addresses.htm
     
    Ralle, Jul 27, 2009 IP
  15. dimitar christoff

    dimitar christoff Active Member

    Messages:
    882
    Likes Received:
    62
    Best Answers:
    0
    Trophy Points:
    90
    #15
    dimitar christoff, Jul 27, 2009 IP