advice for preg_match

Discussion in 'PHP' started by dadaas, Sep 22, 2011.

  1. #1
    This code down ignores related: in $SeQuery when it is on first position, right?
    		$sitepos = strpos($SeQuery, 'related:');
    		if (!$sitepos === false) {  return; } 
    PHP:
    Now i would liek that all bad words and profanity words be ignored, and i m not sure if this is right code, please let me know before i add it to my sites:

    		if (preg_match('/(bad word1|bad word2|bad word3)/i', $SeQuery));
    		{  exit; } 
    PHP:
    Thanks
     
    Solved! View solution.
    Last edited: Sep 22, 2011
    dadaas, Sep 22, 2011 IP
  2. MyVodaFone

    MyVodaFone Well-Known Member

    Messages:
    1,048
    Likes Received:
    42
    Best Answers:
    10
    Trophy Points:
    195
    #2
    Emm is that statement right ?
    Is so there must be a function that already changes bad words, perhaps it would be easier to work with that, and set-up an if condition on that function, example if your the username or something....
     
    MyVodaFone, Sep 23, 2011 IP
  3. dadaas

    dadaas Well-Known Member

    Messages:
    1,298
    Likes Received:
    17
    Best Answers:
    0
    Trophy Points:
    160
    #3
    This is filter for autocreating tags. And i want only clean tags to be created. Fo if i put word F***k inside preg match i want it to stop with creating tag for all tags that have this word inside.

    Do you think my preg_match is working in this way?
     
    dadaas, Sep 23, 2011 IP
  4. gvre

    gvre Member

    Messages:
    35
    Likes Received:
    6
    Best Answers:
    3
    Trophy Points:
    33
    #4
    Try this
    // FULL MATCH SOLUTION
    $badwords  = array("bad word1", "bad word2");
    $tags      = array("good word1", "good word2", "bad word1");
    $intersect = array_intersect($badwords, $tags);
    $cleanTags = array_diff($tags, $intersect);
    print_r($cleanTags);
    
    
    
    Code (markup):
    or this
    
    // PARTIAL MATCH SOLUTION
    $cnt = 0;
    foreach($tags as $tag)
    {
            foreach($badwords as $bd)
            {
                    if (stripos($tag, $bd) !== false)
                            unset($tags[$cnt]);
            }
            $cnt++;
    }
    print_r($tags);
    
    Code (markup):
     
    gvre, Sep 24, 2011 IP
  5. bilginu

    bilginu Greenhorn

    Messages:
    13
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    11
    #5
    i cannot understand this code please write clearly
     
    bilginu, Sep 24, 2011 IP
  6. dadaas

    dadaas Well-Known Member

    Messages:
    1,298
    Likes Received:
    17
    Best Answers:
    0
    Trophy Points:
    160
    #6
    Guys i m not trying to filter tags. I have tool that creates tags from search querryes. So i want to filter search querys which are $SeQuery.

    I use this code for filtering and not creating querys which start with related:
    $sitepos = strpos($SeQuery, 'related:');
            if (!$sitepos === false) {  return; }
    PHP:
    But i need to filter and skip all tags which have bad words like f words p words and you know common bad adult stuff. So i create this above the code i write up there:
    Original code:
    	if (preg_match('/(pičk|sex|kurac|ass|jeb|pizd|seks|shit|www|bitch|dick|pićk|picka|picke|picko|picki|kurc|fuvk|fuk|fuck|whore|naked|gole|goli|lesbo)/i', $SeQuery));
    		{  exit; } 
    PHP:
    But in last 3 days there is single tag created, so i m thinking that this code is maybe wrong or needs tweak or what could be reason for not creating tags, maybe some of the words i filter is too common?

    Maybe it was just exiting everything so i changed the code to:
    		if (preg_match('/(pičk|sex|kurac|ass|jeb|pizd|seks|shit|www|bitch|dick|pićk|picka|picke|picko|picki|kurc|fuvk|fuk|fuck|whore|naked|gole|goli|lesbo)/i', $SeQuery))
    		{  exit; } 
    else 
    {  return; }
    PHP:
    Is this better?
     
    dadaas, Sep 25, 2011 IP
  7. gvre

    gvre Member

    Messages:
    35
    Likes Received:
    6
    Best Answers:
    3
    Trophy Points:
    33
    #7
    Could you provide an example of $SeQuery?
     
    gvre, Sep 25, 2011 IP
  8. gvre

    gvre Member

    Messages:
    35
    Likes Received:
    6
    Best Answers:
    3
    Trophy Points:
    33
    #8
    If you need to skip tags with bad words, you shouldn't exit. The following solution assumes that tags are separated by ','. I hope that the following code will help you to find a solution to your problem.

    define("TAGS_SEPARATOR", ",");
    $SeQuery = "fucker, sex, hello";
    $SeQuery = preg_replace('#[^a-z0-9,]+#si', "", $SeQuery);
    $pattern = '#(pičk|sex|kurac|ass|jeb|pizd|seks|shit|www|bitch|dick|pićk|picka|picke|picko|picki|kurc|fuvk|fuk|fuck|whore|naked|gole|goli|lesbo)#si';
    $tags = explode(TAGS_SEPARATOR, $SeQuery);
    $cnt = 0;
    foreach($tags as $tag)
    {
            if (preg_match($pattern, $tag))
                    unset($tags[$cnt]);
            $cnt++;
    }
    print_r($tags);
    
    Code (markup):
     
    gvre, Sep 25, 2011 IP
  9. dadaas

    dadaas Well-Known Member

    Messages:
    1,298
    Likes Received:
    17
    Best Answers:
    0
    Trophy Points:
    160
    #9
    Hmm maybe your code will help me in some parts but problem is that this are not tags and are not seperated. this are querrys. Search Engine Querrys. So What this script do is if you do a search for let say "Sexy Old Ladies" on a search engine. Then this querrys is taken into script and goes trough lots of process if it will be created to tag or not. So i m not trying to filter tags i m trying to filter querrys containing bad words. Or the words i dont like to be created as tags.

    In script are other various filters: see this part:

    		$SePage        = $se['Page'];
    		$SeQuery      = strtolower($se['Query']);
    		$SeDomain   = strtolower($se['Se']);
    		$SeLang		= strtolower($se['SeLang']);		
    //WE IGNORE Bad words or words we dont like
    		if (preg_match('/(pičk|sex|kurac|ass|jeb|pizd|seks|shit|www|bitch|dick|pićk|picka|picke|picko|picki|kurc|nude|fuvk|fuk|fuck|whore|naked|gole|goli|lesbo)/i', $SeQuery))
    		{  exit; } 
    else 
    {  return; }
    //WE IGNORE searches over 40 characters and under 4 characters
    		if ( strlen($SeQuery) > 40 ) {return; } 
    		if ( strlen($SeQuery) < 4 ) {return; } 
    //WE IGNORE NUMBER-ONLY SEARCHES		
    		if (is_numeric($SeQuery)) {return; } 
    
    //WE IGNORE "http:"-SEARCHES
    		$sitepos = strpos($SeQuery, 'http:');
    		if (!$sitepos === false) {  return; } 
    //WE IGNORE "CACHE:"-searches
    		$sitepos = strpos($SeQuery, 'cache:');
    		if (!$sitepos === false) {  return; } 
    
    //WE IGNORE "SITE:"-SEARCHES
    		$sitepos = strpos($SeQuery, 'site:');
    		if (!$sitepos === false) {  return; } 
    //WE IGNORE "RELATED:"-SEARCHES
    		$sitepos = strpos($SeQuery, 'related:');
    		if (!$sitepos === false) {  return; } 
    PHP:
    Now i m not sure if this exit is good command. If you think this will work please give me a green light or if you think that this exit or pre_match command need to replaced with something else please help me then.

    Thanks allot for your responding.
     
    dadaas, Sep 25, 2011 IP
  10. gvre

    gvre Member

    Messages:
    35
    Likes Received:
    6
    Best Answers:
    3
    Trophy Points:
    33
    #10
    So, if I search for "Sexy Old Ladies", do you want to skip the word Sexy and create tags for "Old" and "Ladies", or skip the complete search phrase ("Sexy Old Ladies")?
     
    gvre, Sep 25, 2011 IP
  11. dadaas

    dadaas Well-Known Member

    Messages:
    1,298
    Likes Received:
    17
    Best Answers:
    0
    Trophy Points:
    160
    #11
    I want it to skip whole phrase.
     
    dadaas, Sep 25, 2011 IP
  12. #12
    I think that the following will do the job

    $SePage   = $se['Page'];
    $SeQuery  = strtolower($se['Query']);
    $SeDomain = strtolower($se['Se']);
    $SeLang   = strtolower($se['SeLang']);
    
    
    $badWordsPattern = '/(pičk|sex|kurac|ass|jeb|pizd|seks|shit|www|bitch|dick|pićk|picka|picke|picko|picki|kurc|nude|fuvk|fuk|fuck|whore|naked|gole|goli|lesbo)/i';
    $ignorePattern = '/(http|cache|site|related):/i';
    if (    is_numeric($SeQuery)
            || preg_match($badWordsPattern, $SeQuery) 
            || preg_match($ignorePattern, $SeQuery) 
            || ($len = strlen($SeQuery)) < 4 
            || $len > 40)
                    return; 
    
    Code (markup):
     
    gvre, Sep 25, 2011 IP
    dadaas likes this.