Removing non standard characters from form input

Discussion in 'PHP' started by CreedFeed, Apr 9, 2008.

  1. #1
    I have a form in which a user is entering input. Sometimes it will contain weird ASCII characters. Most likely the user is copying text from a Word document. For example:

    This is some text  
    Code (markup):
    How would I remove those extra characters from the string before taking the data and sticking in a database (as an example)? Basically, after submitting the form I want to remove those characters and then continue processing the string.
     
    CreedFeed, Apr 9, 2008 IP
  2. LittleJonSupportSite

    LittleJonSupportSite Peon

    Messages:
    386
    Likes Received:
    20
    Best Answers:
    0
    Trophy Points:
    0
    #2
    You can do validation on ascii code.

    If it is not between X and Y then kick it or str_replace it.

    For example:

    
    function isUpperCase($char){
        if(chr($char)>=65 AND chr($char)<=90)
            return true;
        else
            return false;
    }
    
    PHP:
    You could do the same range of ascii codes that is above some non standard.

    Etc... get it?
     
    LittleJonSupportSite, Apr 9, 2008 IP
  3. m0nkeymafia

    m0nkeymafia Well-Known Member

    Messages:
    399
    Likes Received:
    12
    Best Answers:
    0
    Trophy Points:
    125
    #3
    I prefer to use regular expressions like so:

    ereg_replace('[^A-z ]', '', $text);

    This will remove all non characters OR spaces from $text
     
    m0nkeymafia, Apr 9, 2008 IP
  4. baris22

    baris22 Active Member

    Messages:
    543
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    60
    #4
    Will this remove all the unwanted characters? I am using

    
    
    $to_replace = array("#9604;","#9632;","#9786;","#9787;", "#9829;", "#9830;", "#9827;", "#9824;","#8902;", "#9733;", "#9734;", "#9789;", "#9841;", "#9840;", "•", "#9688;", "#9675;", "#9673;", "#9678;", "#9689;", "#9794;", "#9792;", "#9834;", "#9835;", "#9788;", "#9758;", "#8597;", "#8252;", "¶", "§", "#9644;", "#8593;", "#8595;", "#8594;", "#8592;", "#8596;", "#8735;", "#9650;", "#9660;", "#9658;", "#9668;", "#9661;", "#9651;", "#9655;", "#9665;", "#9672;", "#9671;", "#9670;", "#9648;", "#9649;", "#8962;", "#8710;", "#8976;", "¬", "#9617;","¦", "#9618;", "#9619;", "#9635;", "#9636;", "#9637;", "#9638;", "#9639;", "#9640;", "#9641;", "#9474;", "#9508;", "#9569;", "#9570;", "#9558;", "#9557;", "#9571;", "#9553;", "#9559;", "#9565;", "#9564;", "#9563;", "#9488;", "#9492;", "#9524;", "#9516;", "#9500;", "#9472;", "#9532;", "#9566;", "#9567;", "#9562;", "#9556;", "#9577;", "#9574;", "#9568;", "#9552;", "#9580;", "#9575;", "#9576;", "#9572;", "#9573;", "#9561;", "#9560;", "#9554;", "#9555;", "#9579;", "#9578;", "#9496;", "#9484;", "#9608;", "#9612;", "#9616;", "#9600;", "#945;", "#915;", "#960;", "#931;", "#963;", "#964;", "#934;", "#920;", "#937;", "#948;", "#8734;", "#966;", "#949;", "#8745;", "#9696;", "#9697;", "#9581;", "#9582;", "#9583;", "#9584;", "#8801;", "#8805;", "#8804;", "#8992;", "#8993;", "#8776;", "#8729;", "#8730;", "#8319;", "#1758;", "#8362;", "¢", "£", "¥", "€", "#8355;", "#8359;", "ª", "º", "¿", "¬", "½", "#8531;", "#8532;", "¼", "¾", "#8539;", "#8540;", "#8541;", "#8542;", "¡", "µ", "±", "#8800;", "°", "#9450;", "²", "³", "¹",  "¸", "™", "¤", "‰", "#8453;", "#8470;", "†", "‡", "#9477;", "#9478;", "#9480;", "#9482;", "#9585;", "#9586;", "#9587;", "„", "…", "Æ", "Á", "Â", "À", "Å", "Ã", "Ä", "#256;", "#258;", "#260;", "æ", "á", "â", "à", "å", "ã", "ä", "#257;", "#259;", "#261;", "ß", "Ç", "#262;", "#264;", "#266;", "#268;", "ç", "#263;", "#265;", "#267;", "#269;", "#270;", "Ð", "#271;", "ð", "É", "Ê", "È", "Ë", "#274;", "#276;", "#278;", "#280;", "#282;", "é", "ê", "è", "ë", "#275;", "#277;", "#279;", "#281;", "#283;", "ƒ", "#284;", "#286;", "#288;", "#290;", "#285;", "#287;", "#289;", "#291;", "#292;", "#294;", "#293;", "#295;", "Í", "Î", "Ì", "Ï", "#296;", "#298;", "#300;", "#302;", "#304;", "í", "î", "ì", "ï", "#297;", "#299;", "#301;", "#303;", "#305;", "#306;", "#307;", "#308;", "#309;", "#310;", "#311;", "#312;", "#313;", "#315;", "#317;", "#319;", "#321;", "#314;", "#316;", "#318;", "#320;", "#322;", "Ñ", "#323;", "#325;", "#327;", "#330;", "ñ", "#324;", "#326;", "#328;", "#329;", "#331;", "Ó", "Ô", "Ò", "Ø", "Õ", "Ö", "#332;", "#334;", "#336;", "#416;", "ó", "ô", "ò", "ø", "õ", "ö", "#333;", "#335;", "#337;", "#417;", "Þ", "þ", "#340;", "#342;", "#344;", "#341;", "#343;", "#345;", "#346;", "#348;", "#350;", "Š", "#347;", "#349;", "#351;", "š", "#354;", "#356;", "#358;", "#355;", "#357;", "#359;", "Ú", "Û", "Ù", "Ü", "#360;", "#362;", "#364;", "#366;", "#368;", "#370;", "#431;", "ú", "û", "ù", "ü", "#361;", "#363;", "#365;", "#367;", "#369;", "#371;", "#432;", "#372;", "#373;", "Ý", "Ÿ", "ý", "ÿ", "Ž", "#377;", "#379;", "ž", "#378;", "#380;", "Œ", "œ", "#4347;", "#8467;", "#1108;", "#1103;", "#1080;", "#969;", "#1090;", "#1085;", "#965;", "#8494;", "#4304;", "#4305;", "#4306;", "#4307;", "#4308;", "#4309;", "#4310;", "#4311;", "#4312;", "#4313;", "#4314;", "#4315;", "#4316;", "#4317;", "#4318;", "#4319;", "#4320;", "#4321;", "#4322;", "#4323;", "#4324;", "#4325;", "#4326;", "#4327;", "#4328;", "#4329;", "#4330;", "#4331;", "#4332;", "#4333;", "#4334;", "#4335;", "#4336;", "#4337;", "#4338;", "#4339;", "#4340;", "#4341;", "#4342;", "#1329;", "#1330;", "#1331;", "#1332;", "#1333;", "#1334;", "#1335;", "#1336;", "#1337;", "#1338;", "#1339;", "#1340;", "#1341;", "#1342;", "#1343;", "#1344;", "#1345;", "#1346;", "#1347;", "#1348;", "#1349;", "#1350;", "#1351;", "#1352;", "#1353;", "#1354;", "#1355;", "#1356;", "#1357;", "#1358;", "#1359;", "#1360;", "#1361;", "#1362;", "#1363;", "#1364;", "#1365;", "#1366;", "#1377;", "#1378;", "#1379;", "#1380;", "#1381;", "#1382;", "#1383;", "#1384;", "#1385;", "#1386;", "#1387;", "#1388;", "#1389;", "#1390;", "#1391;", "#1392;", "#1393;", "#1394;", "#1395;", "#1396;", "#1397;", "#1398;", "#1399;", "#1400;", "#1401;", "#1402;", "#1403;", "#1404;", "#1405;", "#1406;", "#1407;", "#1408;", "#1409;", "#1410;", "#1411;", "#1412;", "#1413;", "#1414;", "#1415;", "#1488;", "#1489;", "#1490;", "#1491;", "#1492;", "#1493;", "#1494;", "#1495;", "#1496;", "#1497;", "#1498;", "#1499;", "#1500;", "#1501;", "#1502;", "#1503;", "#1504;", "#1505;", "#1506;", "#1507;", "#1508;", "#1509;", "#1510;", "#1511;", "#1512;", "#1513;", "#1514;", "#1570;", "#1571;", "#1572;", "#1573;", "#1574;", "#1575;", "#1576;", "#1577;", "#1578;", "#1579;", "#1580;", "#1581;", "#1582;", "#1583;", "#1584;", "#1585;", "#1586;", "#1587;", "#1588;", "#1589;", "#1590;", "#1591;", "#1592;", "#1593;", "#1594;", "#1601;", "#1602;", "#1603;", "#1604;", "#1605;", "#1606;", "#1607;", "#1608;", "#1609;", "#1610;","&amp;#9604;","#9786;","#9787;", "&amp;hearts;", "&amp;diams;", "&amp;clubs;", "&amp;spades;", "&amp;#8902;", "&amp;#9733;", "&amp;#9734;", "&amp;#9789;", "#9829;", "&amp;#9841;", "&amp;#9840;", "&amp;#0149;", "&amp;#9688;", "&amp;#9675;", "&amp;#9673;", "&amp;#9678;", "&amp;#9689;", "&amp;#9794;", "&amp;#9792;", "&amp;#9834;", "&amp;#9835;", "&amp;#9788;", "&amp;#9758;", "&amp;#8597;", "&amp;#8252;", "&amp;para;", "&amp;sect;", "&amp;#9644;", "&amp;#8593;", "&amp;#8595;", "&amp;#8594;", "&amp;#8592;", "&amp;#8596;", "&amp;#8735;", "&amp;#9650;", "&amp;#9660;", "&amp;#9658;", "&amp;#9668;", "&amp;#9661;", "&amp;#9651;", "&amp;#9655;", "&amp;#9665;", "&amp;#9672;", "&amp;#9671;", "&amp;#9670;", "&amp;#9648;", "&amp;#9649;", "&amp;#8962;", "&amp;#8710;", "&amp;#8976;", "&amp;#9617;", "&amp;#9618;", "&amp;#9619;", "&amp;#9635;", "&amp;#9636;", "&amp;#9637;", "&amp;#9638;", "&amp;#9639;", "&amp;#9640;", "&amp;#9641;", "&amp;#9474;", "&amp;#9508;", "&amp;#9569;", "&amp;#9570;", "&amp;#9558;", "&amp;#9557;", "&amp;#9571;", "&amp;#9553;", "&amp;#9559;", "&amp;#9565;", "&amp;#9564;", "&amp;#9563;", "&amp;#9488;", "&amp;#9492;", "&amp;#9524;", "&amp;#9516;", "&amp;#9500;", "&amp;#9472;", "&amp;#9532;", "&amp;#9566;", "&amp;#9567;", "&amp;#9562;", "&amp;#9556;", "&amp;#9577;", "&amp;#9574;", "&amp;#9568;", "&amp;#9552;", "&amp;#9580;", "&amp;#9575;", "&amp;#9576;", "&amp;#9572;", "&amp;#9573;", "&amp;#9561;", "&amp;#9560;", "&amp;#9554;", "&amp;#9555;", "&amp;#9579;", "&amp;#9578;", "&amp;#9496;", "&amp;#9484;", "&amp;#9608;", "&amp;#9612;", "&amp;#9616;", "&amp;#9600;", "&amp;#945;", "&amp;#915;", "&amp;#960;", "&amp;#931;", "&amp;#963;", "&amp;#964;", "&amp;#934;", "&amp;#920;", "&amp;#937;", "&amp;#948;", "&amp;#8734;", "&amp;#966;", "&amp;#949;", "&amp;#8745;", "&amp;#9696;", "&amp;#9697;", "&amp;#9581;", "&amp;#9582;", "&amp;#9583;", "&amp;#9584;", "&amp;#8801;", "&amp;#8805;", "&amp;#8804;", "&amp;#8992;", "&amp;#8993;", "&amp;#8776;", "&amp;#8729;", "&amp;#8730;", "&amp;#8319;", "&amp;#1758;", "&amp;#8362;", "&amp;cent;", "&amp;pound;", "&amp;yen;", "&amp;euro;", "&amp;#8355;", "&amp;#8359;", "&amp;ordf;", "&amp;ordm;", "&amp;iquest;", "&amp;not;", "&amp;frac12;", "&amp;#8531;", "&amp;#8532;", "&amp;frac14;", "&amp;frac34;", "&amp;#8539;", "&amp;#8540;", "&amp;#8541;", "&amp;#8542;", "&amp;iexcl;", "&amp;laquo;", "&amp;raquo;", "&amp;micro;", "&amp;plusmn;", "&amp;divide;", "&amp;times;", "&amp;ne;", "&amp;deg;", "&amp;middot;", "&amp;#9450;", "&amp;sup2;", "&amp;sup3;", "&amp;sup1;", "&amp;acute;", "&amp;cedil;", "&amp;reg;", "&amp;copy;", "&amp;trade;", "&amp;curren;", "&amp;permil;", "&amp;#8453;", "&amp;#8470;", "&amp;dagger;", "&amp;Dagger;", "&amp;uml;", "&amp;lt;", "&amp;gt;", "&amp;amp;", "&amp;brvbar;", "&amp;#9477;", "&amp;#9478;", "&amp;#9480;", "&amp;#9482;", "&amp;#9585;", "&amp;#9586;", "&amp;#9587;", "&amp;quot;", "&amp;#130;", "&amp;#132;", "&amp;#133;", "&amp;macr;", "&amp;#150;", "&amp;#151;", "&amp;AElig;", "&amp;Aacute;", "&amp;Acirc;", "&amp;Agrave;", "&amp;Aring;", "&amp;Atilde;", "&amp;Auml;", "&amp;#256;", "&amp;#258;", "&amp;#260;", "&amp;aelig;", "&amp;aacute;", "&amp;acirc;", "&amp;agrave;", "&amp;aring;", "&amp;atilde;", "&amp;auml;", "&amp;#257;", "&amp;#259;", "&amp;#261;", "&amp;szlig;", "&amp;Ccedil;", "&amp;#262;", "&amp;#264;", "&amp;#266;", "&amp;#268;", "&amp;ccedil;", "&amp;#263;", "&amp;#265;", "&amp;#267;", "&amp;#269;", "&amp;#270;", "&amp;ETH;", "&amp;#271;", "&amp;eth;", "&amp;Eacute;", "&amp;Ecirc;", "&amp;Egrave;", "&amp;Euml;", "&amp;#274;", "&amp;#276;", "&amp;#278;", "&amp;#280;", "&amp;#282;", "&amp;eacute;", "&amp;ecirc;", "&amp;egrave;", "&amp;euml;", "&amp;#275;", "&amp;#277;", "&amp;#279;", "&amp;#281;", "&amp;#283;", "&amp;#131;", "&amp;#284;", "&amp;#286;", "&amp;#288;", "&amp;#290;", "&amp;#285;", "&amp;#287;", "&amp;#289;", "&amp;#291;", "&amp;#292;", "&amp;#294;", "&amp;#293;", "&amp;#295;", "&amp;Iacute;", "&amp;Icirc;", "&amp;Igrave;", "&amp;Iuml;", "&amp;#296;", "&amp;#298;", "&amp;#300;", "&amp;#302;", "&amp;#304;", "&amp;iacute;", "&amp;icirc;", "&amp;igrave;", "&amp;iuml;", "&amp;#297;", "&amp;#299;", "&amp;#301;", "&amp;#303;", "&amp;#305;", "&amp;#306;", "&amp;#307;", "&amp;#308;", "&amp;#309;", "&amp;#310;", "&amp;#311;", "&amp;#312;", "&amp;#313;", "&amp;#315;", "&amp;#317;", "&amp;#319;", "&amp;#321;", "&amp;#314;", "&amp;#316;", "&amp;#318;", "&amp;#320;", "&amp;#322;", "&amp;Ntilde;", "&amp;#323;", "&amp;#325;", "&amp;#327;", "&amp;#330;", "&amp;ntilde;", "&amp;#324;", "&amp;#326;", "&amp;#328;", "&amp;#329;", "&amp;#331;", "&amp;Oacute;", "&amp;Ocirc;", "&amp;Ograve;", "&amp;Oslash;", "&amp;Otilde;", "&amp;Ouml;", "&amp;#332;", "&amp;#334;", "&amp;#336;", "&amp;#416;", "&amp;oacute;", "&amp;ocirc;", "&amp;ograve;", "&amp;oslash;", "&amp;otilde;", "&amp;ouml;", "&amp;#333;", "&amp;#335;", "&amp;#337;", "&amp;#417;", "&amp;THORN;", "&amp;thorn;", "&amp;#340;", "&amp;#342;", "&amp;#344;", "&amp;#341;", "&amp;#343;", "&amp;#345;", "&amp;#346;", "&amp;#348;", "&amp;#350;", "&amp;#352;", "&amp;#347;", "&amp;#349;", "&amp;#351;", "&amp;#353;", "&amp;#354;", "&amp;#356;", "&amp;#358;", "&amp;#355;", "&amp;#357;", "&amp;#359;", "&amp;Uacute;", "&amp;Ucirc;", "&amp;Ugrave;", "&amp;Uuml;", "&amp;#360;", "&amp;#362;", "&amp;#364;", "&amp;#366;", "&amp;#368;", "&amp;#370;", "&amp;#431;", "&amp;uacute;", "&amp;ucirc;", "&amp;ugrave;", "&amp;uuml;", "&amp;#361;", "&amp;#363;", "&amp;#365;", "&amp;#367;", "&amp;#369;", "&amp;#371;", "&amp;#432;", "&amp;#372;", "&amp;#373;", "&amp;Yacute;", "&amp;Yuml;", "&amp;yacute;", "&amp;yuml;", "&amp;#142;", "&amp;#377;", "&amp;#379;", "&amp;#158;", "&amp;#378;", "&amp;#380;", "&amp;#140;", "&amp;#156;", "&amp;#4347;", "&amp;#8467;", "&amp;#1108;", "&amp;#1103;", "&amp;#1080;", "&amp;#969;", "&amp;#1090;", "&amp;#1085;", "&amp;#965;", "&amp;#8494;", "&amp;#4304;", "&amp;#4305;", "&amp;#4306;", "&amp;#4307;", "&amp;#4308;", "&amp;#4309;", "&amp;#4310;", "&amp;#4311;", "&amp;#4312;", "&amp;#4313;", "&amp;#4314;", "&amp;#4315;", "&amp;#4316;", "&amp;#4317;", "&amp;#4318;", "&amp;#4319;", "&amp;#4320;", "&amp;#4321;", "&amp;#4322;", "&amp;#4323;", "&amp;#4324;", "&amp;#4325;", "&amp;#4326;", "&amp;#4327;", "&amp;#4328;", "&amp;#4329;", "&amp;#4330;", "&amp;#4331;", "&amp;#4332;", "&amp;#4333;", "&amp;#4334;", "&amp;#4335;", "&amp;#4336;", "&amp;#4337;", "&amp;#4338;", "&amp;#4339;", "&amp;#4340;", "&amp;#4341;", "&amp;#4342;", "&amp;#1329;", "&amp;#1330;", "&amp;#1331;", "&amp;#1332;", "&amp;#1333;", "&amp;#1334;", "&amp;#1335;", "&amp;#1336;", "&amp;#1337;", "&amp;#1338;", "&amp;#1339;", "&amp;#1340;", "&amp;#1341;", "&amp;#1342;", "&amp;#1343;", "&amp;#1344;", "&amp;#1345;", "&amp;#1346;", "&amp;#1347;", "&amp;#1348;", "&amp;#1349;", "&amp;#1350;", "&amp;#1351;", "&amp;#1352;", "&amp;#1353;", "&amp;#1354;", "&amp;#1355;", "&amp;#1356;", "&amp;#1357;", "&amp;#1358;", "&amp;#1359;", "&amp;#1360;", "&amp;#1361;", "&amp;#1362;", "&amp;#1363;", "&amp;#1364;", "&amp;#1365;", "&amp;#1366;", "&amp;#1377;", "&amp;#1378;", "&amp;#1379;", "&amp;#1380;", "&amp;#1381;", "&amp;#1382;", "&amp;#1383;", "&amp;#1384;", "&amp;#1385;", "&amp;#1386;", "&amp;#1387;", "&amp;#1388;", "&amp;#1389;", "&amp;#1390;", "&amp;#1391;", "&amp;#1392;", "&amp;#1393;", "&amp;#1394;", "&amp;#1395;", "&amp;#1396;", "&amp;#1397;", "&amp;#1398;", "&amp;#1399;", "&amp;#1400;", "&amp;#1401;", "&amp;#1402;", "&amp;#1403;", "&amp;#1404;", "&amp;#1405;", "&amp;#1406;", "&amp;#1407;", "&amp;#1408;", "&amp;#1409;", "&amp;#1410;", "&amp;#1411;", "&amp;#1412;", "&amp;#1413;", "&amp;#1414;", "&amp;#1415;", "&amp;#1488;", "&amp;#1489;", "&amp;#1490;", "&amp;#1491;", "&amp;#1492;", "&amp;#1493;", "&amp;#1494;", "&amp;#1495;", "&amp;#1496;", "&amp;#1497;", "&amp;#1498;", "&amp;#1499;", "&amp;#1500;", "&amp;#1501;", "&amp;#1502;", "&amp;#1503;", "&amp;#1504;", "&amp;#1505;", "&amp;#1506;", "&amp;#1507;", "&amp;#1508;", "&amp;#1509;", "&amp;#1510;", "&amp;#1511;", "&amp;#1512;", "&amp;#1513;", "&amp;#1514;","ˆ","¨");
    			  $title = str_replace($to_replace, " ", $title);
    
    
    PHP:
    I know my one is not the proper one. But it works
     
    baris22, Apr 9, 2008 IP
  5. m0nkeymafia

    m0nkeymafia Well-Known Member

    Messages:
    399
    Likes Received:
    12
    Best Answers:
    0
    Trophy Points:
    125
    #5
    It will remove ANYTHING that isnt A-Z in either upper or lowercase OR a space.

    So if :

    $text = "934938484 hello how are you?";
    will turn into
    $text = " hello how are you";

    if you want to keep numbers too simlpy change ^A-z to ^0-9A-z
     
    m0nkeymafia, Apr 9, 2008 IP
  6. baris22

    baris22 Active Member

    Messages:
    543
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    60
    #6
    This code does not add punctuation marks. What can i do to add punctuation marks as .,-=?

    Thanks



     
    baris22, Apr 9, 2008 IP
  7. m0nkeymafia

    m0nkeymafia Well-Known Member

    Messages:
    399
    Likes Received:
    12
    Best Answers:
    0
    Trophy Points:
    125
    #7
    You never said you wanted that, without testing you can probably put most of the punctuation in like so

    
    ereg_replace('[^A-z0-9\.=() ]', '', $text);
    
    Code (markup):
    Notice I "escaped" the period with a backslack like so: \.
    You may have to do the same for =, ( and ) but I cant check at the minute so youll have to test yourself
     
    m0nkeymafia, Apr 10, 2008 IP