search for: hanconvert

Displaying 1 result from an estimated 1 matches for "hanconvert".

2011 Apr 21
2
Chinese segmentation
hello, I have finished reading the papers, and i think it is time to design my project. First step will be determine the input characters are Chinese. i see the past post that cjk-tokenizer is just dealing with UTF-8 and unicode, but i see some other code system such as gbk and big5. i am wondering that should i just deal with UTF-8 and unicode?