1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

Ignoring special characters in search query

Discussion in 'MySQL' started by clem_c_rock, Oct 25, 2012.

  1. #1
    I've been wracking my brain trying to find a solution for this for a couple of days. I'm trying to make a "smart" query that can handle a wide range of search terms. The queries run fine until there are special characters involved and I've had some success w/ the REPLACE method on some characters such as commas and dashes. Other characters such as quotes and ampersands will result in empty queries.

    Here's a few examples:

    the original name I'm searching for is "French Is Fun, Book 1 - 1 Year Option" and with this query below, I get results returned with these search terms:

    1. "French Is Fun"
    2. "French Is Fun, book"
    3. "French Is Fun, book"
    4. "French Is Fun, Book 1"

    
    SELECT * FROM `products` WHERE ( (LOWER(name) LIKE '%french is fun book%' OR
     LOWER(replace(name, '  ','')) LIKE '%french is fun book%' OR
     LOWER(replace(name, ' ','')) LIKE '%french is fun book%' OR
     LOWER(replace(name, '-','')) LIKE '%french is fun book%')
    
    Code (markup):
    However, when the original title has an ampersand in it like such: "Global History & Geography: The Growth of Civilizations - 1 Year Option" - I get an empty query when I try these different search terms:

    1. "Global History & Geography"
    2. "Global History Geography"

    I've tried this to no avail
    
    SELECT * FROM `products` WHERE  
    	(LOWER(name) LIKE '%global history geograph%' OR  
    	 	LOWER(replace(name, '  ','')) LIKE '%global history geography%' OR  
    	 	LOWER(replace(name, ' ','')) LIKE '%global history geography%' OR 
    	 	LOWER(replace(name, ',','')) LIKE '%global history geography%' OR 
      		LOWER(replace(name, '&','')) LIKE '%global history geography%' OR  
    	 	LOWER(replace(name, '-','')) LIKE '%global history geography%');
    
    Code (markup):
    I also tried adding an escape character to the ampersand and it doesn't help:
    
    SELECT * FROM `products` WHERE  
    	(LOWER(name) LIKE '%global history geography%' OR  
    	 	LOWER(replace(name, '  ','')) LIKE '%global history geography%' OR  
    	 	LOWER(replace(name, ' ','')) LIKE '%global history geography%' OR 
    	 	LOWER(replace(name, ',','')) LIKE '%global history geography%' OR 
      		LOWER(replace(name, '\&','')) LIKE '%global history geography%' OR  
    	 	LOWER(replace(name, '-','')) LIKE '%global history geography%');
    
    Code (markup):
    And commas in the name also return empty results. As a demonstration, the original name is this:

    "Amsco's AP Calculus AB/BC Preparing for the Advanced Placement Examinations - 1 Year Option"

    This attempt always returns empty queries:

    
    SELECT * FROM `products` WHERE 
    	( (LOWER(name) LIKE '%amscos ap calculus%' OR
    		 LOWER(replace(name, ' ','')) LIKE '%amscos ap calculus%' OR
    		 LOWER(replace(name, '\'','')) LIKE '%amscos ap calculus%' OR
    		 LOWER(replace(name, ',','')) LIKE '%amscos ap calculus%' OR
    		 LOWER(replace(name, '-','')) LIKE '%amscos ap calculus%')
    		) AND ( (`products`.`type` = 'Rental' ) );
    
    Code (markup):

    Any ideas?
     
    clem_c_rock, Oct 25, 2012 IP
  2. jestep

    jestep Prominent Member

    Messages:
    3,659
    Likes Received:
    215
    Best Answers:
    19
    Trophy Points:
    330
    #2
    First off, you shouldn't need to use LOWER unless your collation ends in _cs. Typically tables use _ci collation which stands for case insensitive.

    data stored in table: "French Is Fun, Book 1 - 1 Year Option"
    LIKE '%french is fun%' should match this without any of the other queries.

    As for the other examples, you most likely will need to implement some application logic to accomplish it. Since you are searching for a different literal string than what's stored in the database, there's just no way that it's going to work properly. You could possibly use a UDF regex - https://launchpad.net/mysql-udf-regexp and completely strip all non alpha characters from the column and the search pattern, and this might work.

    Otherwise, I would go for something like this:
    data: "Global History & Geography: The Growth of Civilizations - 1 Year Option"

    SELECT * FROM `products` WHERE (`name` LIKE '%Global%' AND `name` LIKE '%History%' AND `name` LIKE '%Geography%')
    Code (markup):
     
    jestep, Oct 25, 2012 IP
  3. Gemba

    Gemba Member

    Messages:
    36
    Likes Received:
    2
    Best Answers:
    1
    Trophy Points:
    25
    #3
    SELECT * FROM `products` WHERE `name` LIKE '%global%history%geography%'
    Code (markup):
    That should work.
     
    Gemba, Oct 26, 2012 IP
    jestep likes this.
  4. clem_c_rock

    clem_c_rock Member

    Messages:
    14
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    36
    #4
    Just wanted to give an update:

    So far this version of my query is giving me the best results:

    
    SELECT * FROM `products` WHERE (in_store = 1 AND (LOWER( replace(replace(replace(replace(replace(name, ' ',''), ',', ''), '&', ''), '-', ''), ':', '') ) LIKE '%globalhistorygeography%' OR LOWER(name) LIKE '%global history geography%') 
    ) 
    
    Code (markup):
    I'm still having some more "edge case queries that aren't returning results.

    For instance:

    This is the original title and if I use this query it returns results:

    "Lippincott's Q&A Review for NCLEX-RN, North American Edition"

    but, when I take out the ampersand, I don't get any results returned:

    "Lippincott's QA Review for NCLEX-RN, North American Edition"

    Thanks again, you've helped me get to the point where I'm getting acceptable results. Now it's down to knocking out the edge cases
     
    clem_c_rock, Oct 30, 2012 IP