Hi , I would like to parse the email and store it into the database by using PHP. How to do it? What are the difficulties we need to face ? plz guide me. Any help is appreciated.
Hi, I need to parse the email and not the page content. I need to copy all my emails to the database . ie when i get a new email then script will be executed and copy that email stored in a database.
Is this that much difficult to do? because i know php ,js,mysql and i just need to know is startup and not the entire task.
if(preg_match("#^[a-zA-Z0-9-._]+@[a-zA-Z0-9-._]+$#",$email,$a)){ mysql_query("insert into emails set email='$email'"); } because i know php ,js,mysql and -- really?))
save your 20 bucks. this got recently discussed on stack overflow for like the n-th time: http://stackoverflow.com/questions/1109314/whats-the-best-email-validation-regex/1109532#1109532
dimitar christoff, hope that u understood that I joked about $20) this is VERY simple task, and TC need not be to create this thread=) ^[^@]+@[-a-z0-9.]+$ in first path can not be used all symbols without @. For example привет@chikey.ru - will be allow from script, but really can not exist.
yeah i guess, only data cleaning of email lists is no laughing matter - and a lot of marketing companies pay good money for decent data washing. here is an extended PHP check i wrote - not only does it run through the list by applying the regex, it then checks if the domain exists and then if it has mx records. use it to verify mail lists and verify subscriptions - just never occurred to me to check for Cyrillic letters. you are right, it accepts it as a valid email - i don't see why it can't be valid though... http://fragged.org/dev/emailPHPtest.php (source supplied) "Abc\@def"@google.com is valid "Fred Bloggs"@google.com is valid "Joe\Blow"@google.com is valid "Abc@def"@digitalpoint.com is valid customer/department=shipping@shopping.com is valid $A12345@google.com is valid !def!xyz%abc@yahoo.com is valid _somename@gmail.com is valid asdasd @sdasdasd failed: Sorry, this is not a valid email foo@barbarbar.com failed: barbarbar.com has no valid MX foo@barbarbarasdasd.com failed: barbarbarasdasd.com has no DNS records at all привет@chikey.ru is valid function checkEmail($email) { // checks if an email is composed correctly, if its domain exists and if it has valid MX records. // the following regex guarantees RFC compatibility with the largest and yet compliant / strict form of an email address. // check: // - http://haacked.com/archive/2007/08/21/i-knew-how-to-validate-an-email-address-until-i.aspx // - http://fragged.org/dev/email_test.php - using this in javascript $validEmailRegex = '/^(([^<>()[\]\\.,;:\s@\"]+(\.[^<>()[\]\\.,;:\s@\"]+)*)|(\".+\"))@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$/'; // default is valid. $check = true; if (!preg_match($validEmailRegex, $email)) { // if not composed correctly, has spaces etc this will trigger first. $check = "Sorry, this is not a valid email"; } else { // composed ok, now get the domain: $parts = preg_split('/\@/', $email); if (count($parts) > 0) { // get domain $dom = array_pop($parts); // check if it has any dns records, it's a fast check if (checkdnsrr($dom, "ANY")) { // if so, perform the much slower MX check (especially so if the domain looks valid but has no data) getmxrr($dom, $mxhosts); // need at least 1 MX priority host if (count($mxhosts) < 1) $check = "$dom has no valid MX"; // you can now do a socket connection to port 25 of the MX hosts to see if the servers are up // but this is excessive. } else { $check = "$dom has no DNS records at all"; } } else { // should not really come here if regex above works. $check = "Invalid / unknown domain."; } } return $check; } // end checkEmail PHP:
$parts = preg_split('/\@/', $email); advice to optimize code - using in such cases explode $parts = explode('@', $email); in general - good code=)
hrm fair point, i guess the performance gain would be worth it if your array has like 50,000 emails to go through function checkEmail($email) { // checks if an email is composed correctly, if its domain exists and if it has valid MX records. // the following regex guarantees RFC compatibility with the largest and yet compliant / strict form of an email address. // check: // - http://haacked.com/archive/2007/08/21/i-knew-how-to-validate-an-email-address-until-i.aspx // - http://fragged.org/dev/email_test.php - using this in javascript $validEmailRegex = '/^(([^<>()[\]\\.,;:\s@\"]+(\.[^<>()[\]\\.,;:\s@\"]+)*)|(\".+\"))@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$/'; // default is valid. $check = true; if (!preg_match($validEmailRegex, $email)) { // if not composed correctly, has spaces etc this will trigger first. $check = "Sorry, this is not a valid email"; } else { // composed ok, now get the domain: $parts = explode('@', $email); if (count($parts) > 0) { // get domain $dom = array_pop($parts); // check if it has any dns records, it's a fast check if (checkdnsrr($dom, "ANY")) { // if so, perform the much slower MX check (especially so if the domain looks valid but has no data) getmxrr($dom, $mxhosts); // need at least 1 MX priority host if (count($mxhosts) < 1) $check = "$dom has no valid MX"; // you can now do a socket connection to port 25 of the MX hosts to see if the servers are up // but this is excessive. } else { $check = "$dom has no DNS records at all"; } } else { // should not really come here if regex above works. $check = "Invalid / unknown domain."; } } return $check; } // end checkEmail PHP: was annoying enough to find you can have more than 1 instance of '@' in the email and still have it as valid - my original assumption was to do a simple split() then use $parts[1] as the domain. no idea why i went to preg_split in the end
Sorry, i not undestand what u want to say even use google.translate=( 1. i have mistake and use explode wrong 2. u have mistake? 3. php have mistake?)
no, i made the mistake. using explode is faster than using preg_split - just as you suggested. thanks
I notice there's a difference between _should_ use (as advised by about.com) and RFC compliance (http://en.wikipedia.org/wiki/E-mail_address#RFC_specification). Anyway, use whatever you think will suffice