Psuedo Code Help

JudyJiaStyle Well-Known Member

Messages:: 139

Likes Received:: 3

Best Answers:: 0

Trophy Points:: 108

#1

I'm trying to write a local search engine that searches CJK (Chinese, Japanese, Korean) variants. See:

http://hkiug.ln.edu.hk/unicode/hkiug_tsvcc_table-UnicodeVersion-1.0.html

Each line represents different ways a 'word' can be written/typed. They all mean the same 'word'.

Ideally, if I search for a word, my result will also pull up entries that contain the variants.

How do I process a query that has many words without having to search exponential queries? Is that even possible?

I'm in SQL, ASP environment, but any hint at how to go about this would be a lifesaver!

JudyJiaStyle, Jun 21, 2007 IP

Weizheng Peon

Messages:: 93

Likes Received:: 6

Best Answers:: 0

Trophy Points:: 0

#2

Interesting question!

If I were doing this I'll choose one format as the canonical form.
All data are indexed using this canonical form.
Then, all queries will be converted to canonical form before searching the database.

That should do it, I think

Weizheng, Jun 22, 2007 IP

JudyJiaStyle Well-Known Member

Messages:: 139

Likes Received:: 3

Best Answers:: 0

Trophy Points:: 108

#3

Weizheng said: ↑

Interesting question!

If I were doing this I'll choose one format as the canonical form.
All data are indexed using this canonical form.
Then, all queries will be converted to canonical form before searching the database.

That should do it, I think
Click to expand...

The database I'm searching through uses different variants in the entries, and that can't be changed due to the need for accurate representation (it's books, btw). It's all kinds of frustrating!

JudyJiaStyle, Jun 22, 2007 IP

Weizheng Peon

Messages:: 93

Likes Received:: 6

Best Answers:: 0

Trophy Points:: 0

#4

You can show the results in it's original form.
Just that when you index or search, you only do it on the canonical form.

If you're relying on SQL to do the search, just keep 2 copies in
the database:
- canonical copy for searching, and
- original copy for showing search result to user.

Weizheng, Jun 23, 2007 IP

it career Notable Member

Messages:: 3,562

Likes Received:: 155

Best Answers:: 0

Trophy Points:: 270

#5

Keep an entry of keywords,description and file url/path

it career, Jun 23, 2007 IP

UnrealEd Peon

Messages:: 148

Likes Received:: 7

Best Answers:: 0

Trophy Points:: 0

#6

i'm not sure if it's gonna work for those characters, but you might be able to use the mysql SOUNDEX function:
http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_soundex

something like this:
SELECT
  * 
FROM 
  mytable
WHERE
  SOUNDEX ( character) LIKE SOUNDEX (inputted_text)
Code (markup):
where character is the column which represents the character and inputted_text is the text that should be matched against

UnrealEd, Jun 23, 2007 IP

JudyJiaStyle Well-Known Member

Messages:: 139

Likes Received:: 3

Best Answers:: 0

Trophy Points:: 108

#7

Thanks for all the advice. I'm going to try these out!

JudyJiaStyle, Jun 26, 2007 IP

Log in or Sign up

Psuedo Code Help

JudyJiaStyle Well-Known Member

Weizheng Peon

JudyJiaStyle Well-Known Member

Weizheng Peon

it career Notable Member

UnrealEd Peon

JudyJiaStyle Well-Known Member

Useful Searches