Hi all, Im planning to design a knowledge base site. the site will use MS-Access as database and asp.net. the whole idea is to store the details in the database and when the user enters a keyword in the search box the result should bring up the accurate pages where the information the user want is stored. The search page also includes some filters for advanced search purpose. Can anyone help plz me with any good searching algorithms.??
If you are developing the site just for your practice or learning purpose, then it would be good. Just look up the reference manual for Access and it may have some function like MATCH()[match() against() is found in MySQL...search for its equivalent in Access]. If you want advanced algo, then you will have to consider weightage for the keywords in topic title, keyword density in body text, etc. which can grow complex. However, if you are try to develop a commercial script, I will advise you against as it will only waste your time and effort. There are many free and open-source scripts that can be run on PHP and MySQL(both free). Eg: Wordpress, with little theme modification can be used as an excellent knowledge base script. OR may be even Wiki for that matter. I am discouraging you from designing your site, but just warning you because a lot of programmers waste time reinventing the wheel without exploring existing and better options.
rohan_shenoy I went and looked at your blog about PHP email validation. Suspecting that this was too simple I did a search and found these: FILTER_VALIDATE_EMAIL is not RFC2822 compliant PHP Filter_Var FILTER_VALIDATE_EMAIL Newline Injection Vulnerability You might want to note the potential problems in your blog.
Access is not the correct route for a large scale website - the application only allows for 10 concurrent connections which means only 10 searches at a time - I'd switch to MySQL, PostGres, SQL Server, or other database solution
erm, the bloke's asking about search algorithms and you are telling him to change his database? The one thing that really annoys me when shopping around are searches on sites that give you irrelevant results. For instance, searching for 'black pack' - a very generic search string where I'd expect to get backpacks and daypacks in black. The site, chosen at random from google: gooutdoors.co.uk (see the search results yourselves here: http://www.gooutdoors.co.uk/product-list&Text=black pack). When you expect backpacks and get things like "Silva Ranger 3 Compass", "Lifesystems HeadNet Mosquito Hat" and "Wayfayrer Beef Stew and Dumplings", you know something is wrong with the search script. Deciding to look into this further, I clicked on the Lifesystems Mosquito Hat and scanned for the words 'pack' and 'black' - fairly generic. Here they were: - Can screw down into a small "stuff pack" - Ultrafine black mesh Why do we get this problem? Lazy coding. The most basic search practice out there is to do something like: 1. Break string into words. 2. Compose the search query targeting known data fields like title, description, features, word by word, imploding into the query. At this point the where statement can look like 'where (description like '%black%' or features like '%black%' or title like '%black%') and (description like '%pack' ... etc etc)' 3. Display the results and hope for the best. Here is another favourite search of mine that works on this site: the this, Found 253 product(s) - page 1 of 22 I think that it's fair to say, certain words should not be used to score results, they are just too generic to be considered. Unless I am typing something like 'the north face' my 'the' should be dismissed, just as 'this' needs to be dropped. So, what is the alternative? Oddly enough, the most accurate search results are achieved via accurate tagging and product knowledge. This goes like that: 1. Assign tags to each product. You can build an aliases table for common tags and errors. For example, you want to alias things like berghaus with berghouse, berghaus, burghaus etc (you'd be surprised how many people make mistakes). 2. Build the search algorithm to break down the string into parts and analyse them. Drop all common words that won't help and keep the 'useful' bits only. See below for a suggestion of what can be removed from your search. 3. What words you have left treat as tags and select all products that have these tags applied to them. 4. Refine for relevance. This is done by assigning a number of hits on a product. Basically - If I search for Berghaus RG1 Jacket, that's a possible 3 tagwords hit. If the store has the RG1, it should show me just that and none of the results with 2 hits (jacket + berghaus). If they don't have the RG1, we will have an array of jackets by Berghaus and finally, an array of just jackets, so we can display the first ones with the 2 hits as the more relevant results to the search. Advantages: always get the right and relevant results. Disadvantages: you need to manage it, you need to update it and you need to monitor for people making mistakes and aliasing them. The increased conversion ratio will justify the man hours put into tagging your product base. I hope this gives you some ideas anyway. As promised, here is my list of 'bad words' that I disregard from search strings: $badwords = array( "a", "a's", "able", "about", "above", "according", "accordingly", "across", "actually", "afterwards", "again", "against", "ain't", "all", "allow", "allows", "almost", "alone", "along", "already", "also", "although", "always", "am", "among", "amongst", "an", "and", "another", "any", "anybody", "anyhow", "anyone", "anything", "anyway", "anyways", "anywhere", "apart", "appear", "appreciate", "appropriate", "are", "aren't", "around", "as", "aside", "ask", "asking", "associated", "at", "available", "away", "awfully", "b", "be", "became", "because", "become", "becomes", "becoming", "been", "before", "beforehand", "behind", "being", "believe", "below", "beside", "besides", "best", "better", "between", "beyond", "both", "brief", "but", "by", "c", "c'mon", "c's", "came", "can", "can't", "cannot", "cant", "cause", "causes", "certain", "certainly", "changes", "clearly", "co", "com", "come", "comes", "concerning", "consequently", "consider", "considering", "contain", "containing", "contains", "corresponding", "could", "couldn't", "course", "currently", "d", "definitely", "described", "despite", "did", "didn't", "different", "do", "does", "doesn't", "doing", "don't", "done", "down", "downwards", "during", "e", "each", "edu", "eg", "eight", "either", "else", "elsewhere", "enough", "entirely", "especially", "et", "etc", "even", "ever", "every", "everybody", "everyone", "everything", "everywhere", "ex", "exactly", "example", "except", "f", "far", "few", "fifth", "first", "five", "followed", "following", "follows", "for", "former", "formerly", "forth", "four", "from", "further", "furthermore", "g", "get", "gets", "getting", "given", "gives", "go", "goes", "going", "gone", "got", "gotten", "greetings", "h", "had", "hadn't", "happens", "hardly", "has", "hasn't", "have", "haven't", "having", "he", "he's", "hello", "help", "hence", "her", "here", "here's", "hereafter", "hereby", "herein", "hereupon", "hers", "herself", "hi", "him", "himself", "his", "hither", "hopefully", "how", "howbeit", "however", "i", "i'd", "i'll", "i'm", "i've", "ie", "if", "ignored", "immediate", "in", "inasmuch", "inc", "indeed", "indicate", "indicated", "indicates", "inner", "insofar", "instead", "into", "inward", "is", "isn't", "it", "it'd", "it'll", "it's", "its", "itself", "j", "just", "k", "keep", "keeps", "kept", "know", "knows", "known", "l", "last", "lately", "later", "latter", "latterly", "least", "less", "lest", "let", "let's", "like", "liked", "likely", "little", "look", "looking", "looks", "ltd", "m", "mainly", "many", "may", "maybe", "me", "mean", "meanwhile", "merely", "might", "more", "moreover", "most", "mostly", "much", "must", "my", "myself", "n", "name", "namely", "nd", "near", "nearly", "necessary", "need", "needs", "neither", "never", "nevertheless", "new", "next", "nine", "no", "nobody", "non", "none", "noone", "nor", "normally", "not", "nothing", "novel", "now", "nowhere", "o", "obviously", "of", "off", "often", "oh", "ok", "okay", "old", "on", "once", "one", "ones", "only", "onto", "or", "other", "others", "otherwise", "ought", "our", "ours", "ourselves", "out", "outside", "over", "overall", "own", "p", "particular", "particularly", "per", "perhaps", "placed", "please", "plus", "possible", "presumably", "probably", "provides", "q", "que", "quite", "qv", "r", "rather", "rd", "re", "really", "reasonably", "regarding", "regardless", "regards", "relatively", "respectively", "right", "s", "said", "same", "saw", "say", "saying", "says", "second", "secondly", "see", "seeing", "seem", "seemed", "seeming", "seems", "seen", "self", "selves", "sensible", "sent", "serious", "seriously", "seven", "several", "shall", "she", "should", "shouldn't", "since", "six", "so", "some", "somebody", "somehow", "someone", "something", "sometime", "sometimes", "somewhat", "somewhere", "soon", "sorry", "specified", "specify", "specifying", "still", "sub", "such", "sup", "sure", "t", "t's", "take", "taken", "tell", "tends", "th", "than", "thank", "thanks", "thanx", "that", "that's", "thats", "the", "their", "theirs", "them", "themselves", "then", "thence", "there", "there's", "thereafter", "thereby", "therefore", "therein", "theres", "thereupon", "these", "they", "they'd", "they'll", "they're", "they've", "think", "third", "this", "thorough", "thoroughly", "those", "though", "three", "through", "throughout", "thru", "thus", "to", "together", "too", "took", "toward", "towards", "tried", "tries", "truly", "try", "trying", "twice", "two", "u", "un", "under", "unfortunately", "unless", "unlikely", "until", "unto", "up", "upon", "us", "use", "used", "useful", "uses", "using", "usually", "v", "value", "various", "very", "via", "viz", "vs", "w", "want", "wants", "was", "wasn't", "way", "we", "we'd", "we'll", "we're", "we've", "welcome", "well", "went", "were", "weren't", "what", "what's", "whatever", "when", "whence", "whenever", "where", "where's", "whereafter", "whereas", "whereby", "wherein", "whereupon", "wherever", "whether", "which", "while", "whither", "who", "who's", "whoever", "whole", "whom", "whose", "why", "will", "willing", "wish", "with", "within", "without", "won't", "wonder", "would", "would", "wouldn't", "x", "y", "yes", "yet", "you", "you'd", "you'll", "you're", "you've", "your", "yours", "yourself", "yourselves", "z", " ); PHP:
Thank you very much christoff. U were really helpfull. i will definately take ur idea for building my knowledge base search.. Thanx again..
Are you planning on using keywords for search or spoken text? More specifically... are you going to have a seperate field in the database with articles in one and specific keywords/tags that relate to those articles in another?.. or do you want to have the articles and let people search through all the text from that? I've done it both ways and using keywords/tags will generate a more specific answer but you'll need to do some database modifications and a speech query to search entire articles. I can post some code if you need.