What is the best way to do this? Hi, I got a table like this: id, title, otherFields I want to select only those records where "title" does not has specific words. example: select * from table where title not like '%$word%' limit 5 This works if there is just one word to check for, but if there is a big list, say 150 words, then this query can become very long... example: select * from table where ( title not like '%word1%' or title not like '%word2%' or title not like '%word3%'... ) limit 5 Is there a better way of doing this? The words are coming from a different table, structure like this: blockedWords_table: id, word I am selecting all words in this table, then holding them in an array, then supplying them to query like above Is there any other cleaner way of doing this? Thanks
If this is a frequent query I'd consider adding a child table called something like words id, parent_id, type, word index: word so if my title was "Apple AirPods 3rd generation" I'd be inserting each word as a separate row. That's a small overhead at save time that will allow a query like this select * from table where not exists (select * from words where words.parent_id = table.id and type = 'title' and word in ('word1', 'word2', 'word3') There's a "double negative", the subquery is looking for a match and the main query is discarding the record if there is a match. There's an overhead to subqueries so if you have the time and access I'd set it up and run some speed tests of your version, this version and any other suggestions and check the performance. By adding a "type" column to the table you're allowing this to be used against other fields in the future. And finally - you have 150 words you don't want in the title??? What's driving this search???
so you're trying to do something like a profanity filter for text input fields? If so, your profanity list should be a blob where words and phrases are separated by a comma. I'll see if I can find my old filters and codes for you. If i remember correctly I had three different types of filters before an input field was valid.
If you are searching existing titles for profanity then the efficiency of the search isn't important. You need a good filter every time the title is saved into your database. Something like this? https://www.hashbangcode.com/article/simple-swear-filter-php
@sarahk This is a good idea. My only concern is that storing each word in title as separate record in a second table will increase the size of that table a lot. And I think you are right, I probably won't need 150 words list... 10-20 might work as well. @c1lonewolf Yes, that is the purpose.
If you're just profanity checking you can stick with your current query while you're in cleanup mode and put the profanity check in place when the record is being created or updated. No need for a query, just run the text past your checking function.
@sarahk That's the whole problem, cannot do the check at entry time itself. All entries have to go in db, no matter what. Have to do spamcheck from these and display. Users of db can also choose to view all entries without spamcheck implementation.
Filters should be applied when a form posts data for a new record. Their used is to keep things "out" of your database that could do harm to your site or visitors systems. Replacing profanity with a bunch of symbols or empty spaces is just garbage. Filters can deter bots, validate emails, keep other sites from posting data to you database even allow only specific file types to be uploaded etc. Not everything is profanity. There's spam postings...if you run a site for small children, you don't want some bot posting title links to gambling or porn sites...and if you're not going to check every post then you need someone who will, the code. Not to mention there's profanity in different languages that beed to be accounted for (search words for profanity filters) There was a group of mothers from all over the world that began building a list a fews years back, check it out. Anyway, JEET, sent you some code by way of "Conversation" I think that's PM these days not really sure.
I think you can try this: <?PHP $s_query = $_POST['query']; $s_ar = explode(' ',$s_query); $sql = 'SELECT * FROM table WHERE title IS NOT NULL'; if($s_ar < 1){ echo 'empty query.'; }else{ $ts = implode('|',array_values($s_ar)).''; $sql = ' '.$ts; } echo $sql; ?> PHP:
The most efficient way is my way : Create a table "words": "id,word" and one words2title: "id,word,entry_id" (id is autoincrement) 1. make a list of all the negative keywords (stemmed first) you are looking for and insert them into the table "words" 2. when inserting the title into the DB, with PHP, take each word from the title, run it through a stemmer (there are libraries on github). 3. insert ignore each stemmed word into the first table "words" 4. take the ID of the word and insert into the words2title table, along with the ID of the entry that contains the title 5. just search the words2title where word NOT IN (1,2,3,4,5,...150) ..whatever your negative word IDs are You're also not limited to mysql, you can add this list/tables to REDIS or alternatives for increased speed (like x100)