1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

Removing non standard characters from form input

Discussion in 'PHP' started by CreedFeed, Apr 9, 2008.

  1. #1
    I have a form in which a user is entering input. Sometimes it will contain weird ASCII characters. Most likely the user is copying text from a Word document. For example:

    This is some text  
    Code (markup):
    How would I remove those extra characters from the string before taking the data and sticking in a database (as an example)? Basically, after submitting the form I want to remove those characters and then continue processing the string.
     
    CreedFeed, Apr 9, 2008 IP
  2. LittleJonSupportSite

    LittleJonSupportSite Peon

    Messages:
    386
    Likes Received:
    20
    Best Answers:
    0
    Trophy Points:
    0
    #2
    You can do validation on ascii code.

    If it is not between X and Y then kick it or str_replace it.

    For example:

    
    function isUpperCase($char){
        if(chr($char)>=65 AND chr($char)<=90)
            return true;
        else
            return false;
    }
    
    PHP:
    You could do the same range of ascii codes that is above some non standard.

    Etc... get it?
     
    LittleJonSupportSite, Apr 9, 2008 IP
  3. m0nkeymafia

    m0nkeymafia Well-Known Member

    Messages:
    399
    Likes Received:
    12
    Best Answers:
    0
    Trophy Points:
    125
    #3
    I prefer to use regular expressions like so:

    ereg_replace('[^A-z ]', '', $text);

    This will remove all non characters OR spaces from $text
     
    m0nkeymafia, Apr 9, 2008 IP
  4. baris22

    baris22 Active Member

    Messages:
    543
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    60
    #4
    Will this remove all the unwanted characters? I am using

    
    
    $to_replace = array("#9604;","#9632;","#9786;","#9787;", "#9829;", "#9830;", "#9827;", "#9824;","#8902;", "#9733;", "#9734;", "#9789;", "#9841;", "#9840;", "•", "#9688;", "#9675;", "#9673;", "#9678;", "#9689;", "#9794;", "#9792;", "#9834;", "#9835;", "#9788;", "#9758;", "#8597;", "#8252;", "¶", "§", "#9644;", "#8593;", "#8595;", "#8594;", "#8592;", "#8596;", "#8735;", "#9650;", "#9660;", "#9658;", "#9668;", "#9661;", "#9651;", "#9655;", "#9665;", "#9672;", "#9671;", "#9670;", "#9648;", "#9649;", "#8962;", "#8710;", "#8976;", "¬", "#9617;","¦", "#9618;", "#9619;", "#9635;", "#9636;", "#9637;", "#9638;", "#9639;", "#9640;", "#9641;", "#9474;", "#9508;", "#9569;", "#9570;", "#9558;", "#9557;", "#9571;", "#9553;", "#9559;", "#9565;", "#9564;", "#9563;", "#9488;", "#9492;", "#9524;", "#9516;", "#9500;", "#9472;", "#9532;", "#9566;", "#9567;", "#9562;", "#9556;", "#9577;", "#9574;", "#9568;", "#9552;", "#9580;", "#9575;", "#9576;", "#9572;", "#9573;", "#9561;", "#9560;", "#9554;", "#9555;", "#9579;", "#9578;", "#9496;", "#9484;", "#9608;", "#9612;", "#9616;", "#9600;", "#945;", "#915;", "#960;", "#931;", "#963;", "#964;", "#934;", "#920;", "#937;", "#948;", "#8734;", "#966;", "#949;", "#8745;", "#9696;", "#9697;", "#9581;", "#9582;", "#9583;", "#9584;", "#8801;", "#8805;", "#8804;", "#8992;", "#8993;", "#8776;", "#8729;", "#8730;", "#8319;", "#1758;", "#8362;", "¢", "£", "¥", "€", "#8355;", "#8359;", "ª", "º", "¿", "¬", "½", "#8531;", "#8532;", "¼", "¾", "#8539;", "#8540;", "#8541;", "#8542;", "¡", "µ", "±", "#8800;", "°", "#9450;", "²", "³", "¹",  "¸", "™", "¤", "‰", "#8453;", "#8470;", "†", "‡", "#9477;", "#9478;", "#9480;", "#9482;", "#9585;", "#9586;", "#9587;", "„", "…", "Æ", "Á", "Â", "À", "Å", "Ã", "Ä", "#256;", "#258;", "#260;", "æ", "á", "â", "à", "å", "ã", "ä", "#257;", "#259;", "#261;", "ß", "Ç", "#262;", "#264;", "#266;", "#268;", "ç", "#263;", "#265;", "#267;", "#269;", "#270;", "Ð", "#271;", "ð", "É", "Ê", "È", "Ë", "#274;", "#276;", "#278;", "#280;", "#282;", "é", "ê", "è", "ë", "#275;", "#277;", "#279;", "#281;", "#283;", "ƒ", "#284;", "#286;", "#288;", "#290;", "#285;", "#287;", "#289;", "#291;", "#292;", "#294;", "#293;", "#295;", "Í", "Î", "Ì", "Ï", "#296;", "#298;", "#300;", "#302;", "#304;", "í", "î", "ì", "ï", "#297;", "#299;", "#301;", "#303;", "#305;", "#306;", "#307;", "#308;", "#309;", "#310;", "#311;", "#312;", "#313;", "#315;", "#317;", "#319;", "#321;", "#314;", "#316;", "#318;", "#320;", "#322;", "Ñ", "#323;", "#325;", "#327;", "#330;", "ñ", "#324;", "#326;", "#328;", "#329;", "#331;", "Ó", "Ô", "Ò", "Ø", "Õ", "Ö", "#332;", "#334;", "#336;", "#416;", "ó", "ô", "ò", "ø", "õ", "ö", "#333;", "#335;", "#337;", "#417;", "Þ", "þ", "#340;", "#342;", "#344;", "#341;", "#343;", "#345;", "#346;", "#348;", "#350;", "Š", "#347;", "#349;", "#351;", "š", "#354;", "#356;", "#358;", "#355;", "#357;", "#359;", "Ú", "Û", "Ù", "Ü", "#360;", "#362;", "#364;", "#366;", "#368;", "#370;", "#431;", "ú", "û", "ù", "ü", "#361;", "#363;", "#365;", "#367;", "#369;", "#371;", "#432;", "#372;", "#373;", "Ý", "Ÿ", "ý", "ÿ", "Ž", "#377;", "#379;", "ž", "#378;", "#380;", "Œ", "œ", "#4347;", "#8467;", "#1108;", "#1103;", "#1080;", "#969;", "#1090;", "#1085;", "#965;", "#8494;", "#4304;", "#4305;", "#4306;", "#4307;", "#4308;", "#4309;", "#4310;", "#4311;", "#4312;", "#4313;", "#4314;", "#4315;", "#4316;", "#4317;", "#4318;", "#4319;", "#4320;", "#4321;", "#4322;", "#4323;", "#4324;", "#4325;", "#4326;", "#4327;", "#4328;", "#4329;", "#4330;", "#4331;", "#4332;", "#4333;", "#4334;", "#4335;", "#4336;", "#4337;", "#4338;", "#4339;", "#4340;", "#4341;", "#4342;", "#1329;", "#1330;", "#1331;", "#1332;", "#1333;", "#1334;", "#1335;", "#1336;", "#1337;", "#1338;", "#1339;", "#1340;", "#1341;", "#1342;", "#1343;", "#1344;", "#1345;", "#1346;", "#1347;", "#1348;", "#1349;", "#1350;", "#1351;", "#1352;", "#1353;", "#1354;", "#1355;", "#1356;", "#1357;", "#1358;", "#1359;", "#1360;", "#1361;", "#1362;", "#1363;", "#1364;", "#1365;", "#1366;", "#1377;", "#1378;", "#1379;", "#1380;", "#1381;", "#1382;", "#1383;", "#1384;", "#1385;", "#1386;", "#1387;", "#1388;", "#1389;", "#1390;", "#1391;", "#1392;", "#1393;", "#1394;", "#1395;", "#1396;", "#1397;", "#1398;", "#1399;", "#1400;", "#1401;", "#1402;", "#1403;", "#1404;", "#1405;", "#1406;", "#1407;", "#1408;", "#1409;", "#1410;", "#1411;", "#1412;", "#1413;", "#1414;", "#1415;", "#1488;", "#1489;", "#1490;", "#1491;", "#1492;", "#1493;", "#1494;", "#1495;", "#1496;", "#1497;", "#1498;", "#1499;", "#1500;", "#1501;", "#1502;", "#1503;", "#1504;", "#1505;", "#1506;", "#1507;", "#1508;", "#1509;", "#1510;", "#1511;", "#1512;", "#1513;", "#1514;", "#1570;", "#1571;", "#1572;", "#1573;", "#1574;", "#1575;", "#1576;", "#1577;", "#1578;", "#1579;", "#1580;", "#1581;", "#1582;", "#1583;", "#1584;", "#1585;", "#1586;", "#1587;", "#1588;", "#1589;", "#1590;", "#1591;", "#1592;", "#1593;", "#1594;", "#1601;", "#1602;", "#1603;", "#1604;", "#1605;", "#1606;", "#1607;", "#1608;", "#1609;", "#1610;","&amp;#9604;","#9786;","#9787;", "&amp;hearts;", "&amp;diams;", "&amp;clubs;", "&amp;spades;", "&amp;#8902;", "&amp;#9733;", "&amp;#9734;", "&amp;#9789;", "#9829;", "&amp;#9841;", "&amp;#9840;", "&amp;#0149;", "&amp;#9688;", "&amp;#9675;", "&amp;#9673;", "&amp;#9678;", "&amp;#9689;", "&amp;#9794;", "&amp;#9792;", "&amp;#9834;", "&amp;#9835;", "&amp;#9788;", "&amp;#9758;", "&amp;#8597;", "&amp;#8252;", "&amp;para;", "&amp;sect;", "&amp;#9644;", "&amp;#8593;", "&amp;#8595;", "&amp;#8594;", "&amp;#8592;", "&amp;#8596;", "&amp;#8735;", "&amp;#9650;", "&amp;#9660;", "&amp;#9658;", "&amp;#9668;", "&amp;#9661;", "&amp;#9651;", "&amp;#9655;", "&amp;#9665;", "&amp;#9672;", "&amp;#9671;", "&amp;#9670;", "&amp;#9648;", "&amp;#9649;", "&amp;#8962;", "&amp;#8710;", "&amp;#8976;", "&amp;#9617;", "&amp;#9618;", "&amp;#9619;", "&amp;#9635;", "&amp;#9636;", "&amp;#9637;", "&amp;#9638;", "&amp;#9639;", "&amp;#9640;", "&amp;#9641;", "&amp;#9474;", "&amp;#9508;", "&amp;#9569;", "&amp;#9570;", "&amp;#9558;", "&amp;#9557;", "&amp;#9571;", "&amp;#9553;", "&amp;#9559;", "&amp;#9565;", "&amp;#9564;", "&amp;#9563;", "&amp;#9488;", "&amp;#9492;", "&amp;#9524;", "&amp;#9516;", "&amp;#9500;", "&amp;#9472;", "&amp;#9532;", "&amp;#9566;", "&amp;#9567;", "&amp;#9562;", "&amp;#9556;", "&amp;#9577;", "&amp;#9574;", "&amp;#9568;", "&amp;#9552;", "&amp;#9580;", "&amp;#9575;", "&amp;#9576;", "&amp;#9572;", "&amp;#9573;", "&amp;#9561;", "&amp;#9560;", "&amp;#9554;", "&amp;#9555;", "&amp;#9579;", "&amp;#9578;", "&amp;#9496;", "&amp;#9484;", "&amp;#9608;", "&amp;#9612;", "&amp;#9616;", "&amp;#9600;", "&amp;#945;", "&amp;#915;", "&amp;#960;", "&amp;#931;", "&amp;#963;", "&amp;#964;", "&amp;#934;", "&amp;#920;", "&amp;#937;", "&amp;#948;", "&amp;#8734;", "&amp;#966;", "&amp;#949;", "&amp;#8745;", "&amp;#9696;", "&amp;#9697;", "&amp;#9581;", "&amp;#9582;", "&amp;#9583;", "&amp;#9584;", "&amp;#8801;", "&amp;#8805;", "&amp;#8804;", "&amp;#8992;", "&amp;#8993;", "&amp;#8776;", "&amp;#8729;", "&amp;#8730;", "&amp;#8319;", "&amp;#1758;", "&amp;#8362;", "&amp;cent;", "&amp;pound;", "&amp;yen;", "&amp;euro;", "&amp;#8355;", "&amp;#8359;", "&amp;ordf;", "&amp;ordm;", "&amp;iquest;", "&amp;not;", "&amp;frac12;", "&amp;#8531;", "&amp;#8532;", "&amp;frac14;", "&amp;frac34;", "&amp;#8539;", "&amp;#8540;", "&amp;#8541;", "&amp;#8542;", "&amp;iexcl;", "&amp;laquo;", "&amp;raquo;", "&amp;micro;", "&amp;plusmn;", "&amp;divide;", "&amp;times;", "&amp;ne;", "&amp;deg;", "&amp;middot;", "&amp;#9450;", "&amp;sup2;", "&amp;sup3;", "&amp;sup1;", "&amp;acute;", "&amp;cedil;", "&amp;reg;", "&amp;copy;", "&amp;trade;", "&amp;curren;", "&amp;permil;", "&amp;#8453;", "&amp;#8470;", "&amp;dagger;", "&amp;Dagger;", "&amp;uml;", "&amp;lt;", "&amp;gt;", "&amp;amp;", "&amp;brvbar;", "&amp;#9477;", "&amp;#9478;", "&amp;#9480;", "&amp;#9482;", "&amp;#9585;", "&amp;#9586;", "&amp;#9587;", "&amp;quot;", "&amp;#130;", "&amp;#132;", "&amp;#133;", "&amp;macr;", "&amp;#150;", "&amp;#151;", "&amp;AElig;", "&amp;Aacute;", "&amp;Acirc;", "&amp;Agrave;", "&amp;Aring;", "&amp;Atilde;", "&amp;Auml;", "&amp;#256;", "&amp;#258;", "&amp;#260;", "&amp;aelig;", "&amp;aacute;", "&amp;acirc;", "&amp;agrave;", "&amp;aring;", "&amp;atilde;", "&amp;auml;", "&amp;#257;", "&amp;#259;", "&amp;#261;", "&amp;szlig;", "&amp;Ccedil;", "&amp;#262;", "&amp;#264;", "&amp;#266;", "&amp;#268;", "&amp;ccedil;", "&amp;#263;", "&amp;#265;", "&amp;#267;", "&amp;#269;", "&amp;#270;", "&amp;ETH;", "&amp;#271;", "&amp;eth;", "&amp;Eacute;", "&amp;Ecirc;", "&amp;Egrave;", "&amp;Euml;", "&amp;#274;", "&amp;#276;", "&amp;#278;", "&amp;#280;", "&amp;#282;", "&amp;eacute;", "&amp;ecirc;", "&amp;egrave;", "&amp;euml;", "&amp;#275;", "&amp;#277;", "&amp;#279;", "&amp;#281;", "&amp;#283;", "&amp;#131;", "&amp;#284;", "&amp;#286;", "&amp;#288;", "&amp;#290;", "&amp;#285;", "&amp;#287;", "&amp;#289;", "&amp;#291;", "&amp;#292;", "&amp;#294;", "&amp;#293;", "&amp;#295;", "&amp;Iacute;", "&amp;Icirc;", "&amp;Igrave;", "&amp;Iuml;", "&amp;#296;", "&amp;#298;", "&amp;#300;", "&amp;#302;", "&amp;#304;", "&amp;iacute;", "&amp;icirc;", "&amp;igrave;", "&amp;iuml;", "&amp;#297;", "&amp;#299;", "&amp;#301;", "&amp;#303;", "&amp;#305;", "&amp;#306;", "&amp;#307;", "&amp;#308;", "&amp;#309;", "&amp;#310;", "&amp;#311;", "&amp;#312;", "&amp;#313;", "&amp;#315;", "&amp;#317;", "&amp;#319;", "&amp;#321;", "&amp;#314;", "&amp;#316;", "&amp;#318;", "&amp;#320;", "&amp;#322;", "&amp;Ntilde;", "&amp;#323;", "&amp;#325;", "&amp;#327;", "&amp;#330;", "&amp;ntilde;", "&amp;#324;", "&amp;#326;", "&amp;#328;", "&amp;#329;", "&amp;#331;", "&amp;Oacute;", "&amp;Ocirc;", "&amp;Ograve;", "&amp;Oslash;", "&amp;Otilde;", "&amp;Ouml;", "&amp;#332;", "&amp;#334;", "&amp;#336;", "&amp;#416;", "&amp;oacute;", "&amp;ocirc;", "&amp;ograve;", "&amp;oslash;", "&amp;otilde;", "&amp;ouml;", "&amp;#333;", "&amp;#335;", "&amp;#337;", "&amp;#417;", "&amp;THORN;", "&amp;thorn;", "&amp;#340;", "&amp;#342;", "&amp;#344;", "&amp;#341;", "&amp;#343;", "&amp;#345;", "&amp;#346;", "&amp;#348;", "&amp;#350;", "&amp;#352;", "&amp;#347;", "&amp;#349;", "&amp;#351;", "&amp;#353;", "&amp;#354;", "&amp;#356;", "&amp;#358;", "&amp;#355;", "&amp;#357;", "&amp;#359;", "&amp;Uacute;", "&amp;Ucirc;", "&amp;Ugrave;", "&amp;Uuml;", "&amp;#360;", "&amp;#362;", "&amp;#364;", "&amp;#366;", "&amp;#368;", "&amp;#370;", "&amp;#431;", "&amp;uacute;", "&amp;ucirc;", "&amp;ugrave;", "&amp;uuml;", "&amp;#361;", "&amp;#363;", "&amp;#365;", "&amp;#367;", "&amp;#369;", "&amp;#371;", "&amp;#432;", "&amp;#372;", "&amp;#373;", "&amp;Yacute;", "&amp;Yuml;", "&amp;yacute;", "&amp;yuml;", "&amp;#142;", "&amp;#377;", "&amp;#379;", "&amp;#158;", "&amp;#378;", "&amp;#380;", "&amp;#140;", "&amp;#156;", "&amp;#4347;", "&amp;#8467;", "&amp;#1108;", "&amp;#1103;", "&amp;#1080;", "&amp;#969;", "&amp;#1090;", "&amp;#1085;", "&amp;#965;", "&amp;#8494;", "&amp;#4304;", "&amp;#4305;", "&amp;#4306;", "&amp;#4307;", "&amp;#4308;", "&amp;#4309;", "&amp;#4310;", "&amp;#4311;", "&amp;#4312;", "&amp;#4313;", "&amp;#4314;", "&amp;#4315;", "&amp;#4316;", "&amp;#4317;", "&amp;#4318;", "&amp;#4319;", "&amp;#4320;", "&amp;#4321;", "&amp;#4322;", "&amp;#4323;", "&amp;#4324;", "&amp;#4325;", "&amp;#4326;", "&amp;#4327;", "&amp;#4328;", "&amp;#4329;", "&amp;#4330;", "&amp;#4331;", "&amp;#4332;", "&amp;#4333;", "&amp;#4334;", "&amp;#4335;", "&amp;#4336;", "&amp;#4337;", "&amp;#4338;", "&amp;#4339;", "&amp;#4340;", "&amp;#4341;", "&amp;#4342;", "&amp;#1329;", "&amp;#1330;", "&amp;#1331;", "&amp;#1332;", "&amp;#1333;", "&amp;#1334;", "&amp;#1335;", "&amp;#1336;", "&amp;#1337;", "&amp;#1338;", "&amp;#1339;", "&amp;#1340;", "&amp;#1341;", "&amp;#1342;", "&amp;#1343;", "&amp;#1344;", "&amp;#1345;", "&amp;#1346;", "&amp;#1347;", "&amp;#1348;", "&amp;#1349;", "&amp;#1350;", "&amp;#1351;", "&amp;#1352;", "&amp;#1353;", "&amp;#1354;", "&amp;#1355;", "&amp;#1356;", "&amp;#1357;", "&amp;#1358;", "&amp;#1359;", "&amp;#1360;", "&amp;#1361;", "&amp;#1362;", "&amp;#1363;", "&amp;#1364;", "&amp;#1365;", "&amp;#1366;", "&amp;#1377;", "&amp;#1378;", "&amp;#1379;", "&amp;#1380;", "&amp;#1381;", "&amp;#1382;", "&amp;#1383;", "&amp;#1384;", "&amp;#1385;", "&amp;#1386;", "&amp;#1387;", "&amp;#1388;", "&amp;#1389;", "&amp;#1390;", "&amp;#1391;", "&amp;#1392;", "&amp;#1393;", "&amp;#1394;", "&amp;#1395;", "&amp;#1396;", "&amp;#1397;", "&amp;#1398;", "&amp;#1399;", "&amp;#1400;", "&amp;#1401;", "&amp;#1402;", "&amp;#1403;", "&amp;#1404;", "&amp;#1405;", "&amp;#1406;", "&amp;#1407;", "&amp;#1408;", "&amp;#1409;", "&amp;#1410;", "&amp;#1411;", "&amp;#1412;", "&amp;#1413;", "&amp;#1414;", "&amp;#1415;", "&amp;#1488;", "&amp;#1489;", "&amp;#1490;", "&amp;#1491;", "&amp;#1492;", "&amp;#1493;", "&amp;#1494;", "&amp;#1495;", "&amp;#1496;", "&amp;#1497;", "&amp;#1498;", "&amp;#1499;", "&amp;#1500;", "&amp;#1501;", "&amp;#1502;", "&amp;#1503;", "&amp;#1504;", "&amp;#1505;", "&amp;#1506;", "&amp;#1507;", "&amp;#1508;", "&amp;#1509;", "&amp;#1510;", "&amp;#1511;", "&amp;#1512;", "&amp;#1513;", "&amp;#1514;","ˆ","¨");
    			  $title = str_replace($to_replace, " ", $title);
    
    
    PHP:
    I know my one is not the proper one. But it works
     
    baris22, Apr 9, 2008 IP
  5. m0nkeymafia

    m0nkeymafia Well-Known Member

    Messages:
    399
    Likes Received:
    12
    Best Answers:
    0
    Trophy Points:
    125
    #5
    It will remove ANYTHING that isnt A-Z in either upper or lowercase OR a space.

    So if :

    $text = "934938484 hello how are you?";
    will turn into
    $text = " hello how are you";

    if you want to keep numbers too simlpy change ^A-z to ^0-9A-z
     
    m0nkeymafia, Apr 9, 2008 IP
  6. baris22

    baris22 Active Member

    Messages:
    543
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    60
    #6
    This code does not add punctuation marks. What can i do to add punctuation marks as .,-=?

    Thanks



     
    baris22, Apr 9, 2008 IP
  7. m0nkeymafia

    m0nkeymafia Well-Known Member

    Messages:
    399
    Likes Received:
    12
    Best Answers:
    0
    Trophy Points:
    125
    #7
    You never said you wanted that, without testing you can probably put most of the punctuation in like so

    
    ereg_replace('[^A-z0-9\.=() ]', '', $text);
    
    Code (markup):
    Notice I "escaped" the period with a backslack like so: \.
    You may have to do the same for =, ( and ) but I cant check at the minute so youll have to test yourself
     
    m0nkeymafia, Apr 10, 2008 IP