1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

Psuedo Code Help

Discussion in 'Programming' started by JudyJiaStyle, Jun 21, 2007.

  1. #1
    I'm trying to write a local search engine that searches CJK (Chinese, Japanese, Korean) variants. See:

    http://hkiug.ln.edu.hk/unicode/hkiug_tsvcc_table-UnicodeVersion-1.0.html

    Each line represents different ways a 'word' can be written/typed. They all mean the same 'word'.

    Ideally, if I search for a word, my result will also pull up entries that contain the variants.

    How do I process a query that has many words without having to search exponential queries? Is that even possible?

    I'm in SQL, ASP environment, but any hint at how to go about this would be a lifesaver!
     
    JudyJiaStyle, Jun 21, 2007 IP
  2. Weizheng

    Weizheng Peon

    Messages:
    93
    Likes Received:
    6
    Best Answers:
    0
    Trophy Points:
    0
    #2
    Interesting question!

    If I were doing this I'll choose one format as the canonical form.
    All data are indexed using this canonical form.
    Then, all queries will be converted to canonical form before searching the database.

    That should do it, I think :)
     
    Weizheng, Jun 22, 2007 IP
  3. JudyJiaStyle

    JudyJiaStyle Well-Known Member

    Messages:
    139
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    108
    #3
    The database I'm searching through uses different variants in the entries, and that can't be changed due to the need for accurate representation (it's books, btw). It's all kinds of frustrating! :D
     
    JudyJiaStyle, Jun 22, 2007 IP
  4. Weizheng

    Weizheng Peon

    Messages:
    93
    Likes Received:
    6
    Best Answers:
    0
    Trophy Points:
    0
    #4
    You can show the results in it's original form.
    Just that when you index or search, you only do it on the canonical form.

    If you're relying on SQL to do the search, just keep 2 copies in
    the database:
    - canonical copy for searching, and
    - original copy for showing search result to user.
     
    Weizheng, Jun 23, 2007 IP
  5. it career

    it career Notable Member

    Messages:
    3,562
    Likes Received:
    155
    Best Answers:
    0
    Trophy Points:
    270
    #5
    Keep an entry of keywords,description and file url/path
     
    it career, Jun 23, 2007 IP
  6. UnrealEd

    UnrealEd Peon

    Messages:
    148
    Likes Received:
    7
    Best Answers:
    0
    Trophy Points:
    0
    #6
    i'm not sure if it's gonna work for those characters, but you might be able to use the mysql SOUNDEX function:
    http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_soundex

    something like this:
    SELECT
      * 
    FROM 
      mytable
    WHERE
      SOUNDEX ( character) LIKE SOUNDEX (inputted_text)
    Code (markup):
    where character is the column which represents the character and inputted_text is the text that should be matched against
     
    UnrealEd, Jun 23, 2007 IP
  7. JudyJiaStyle

    JudyJiaStyle Well-Known Member

    Messages:
    139
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    108
    #7
    Thanks for all the advice. I'm going to try these out!
     
    JudyJiaStyle, Jun 26, 2007 IP