search for: trigrams

Displaying 5 results from an estimated 5 matches for "trigrams".

Did you mean: trigram
2009 Jan 22
4
text vector clustering
Hi, I am a new user of R using R 2.8.1 in windows 2003. I have a csv file with single column which contain the 30,000 students names. There were typo errors while entering this student names. The actual list of names is < 1000. However we dont have that list for keyword search. I am interested in grouping/cluster these names as those which are similar letter to letter. Are there any
2014 Mar 03
2
Project: Weighting Schemes
Hello Sir, I am Reetesh Ranjan, a 3rd year undergraduate student at the *INDIAN INSTITUTE OF TECHNOLOGY BHU, Varanasi-*one of the premier engineering colleges of India. I have gone through your webpage thoroughly and I am very interested in the work that you are undertaking on *Project: Weighting Schemes.*. I earnestly wish to work under your guidance, learn and progress through this experience.
2007 Mar 28
2
Moving indextext.cc into core.
One of the items on the ToDo list for version 1.0 at http://wiki.xapian.org/TodoFor1_2e0#preview is: "Rework Omega's indextext.cc as a xapian-core "TextSplitter" class." I've been wondering about this for a while now. Currently, we have the Query Parser in Xapian core, but no text processing. Clearly, it makes sense to have a "text splitter" class in
2005 Oct 08
1
*wildcard* support?
Hello, First I wanted to say thanks for a great piece of software, thanks Olly and others who've contributed! I know that Xapian supports right-truncating, if that's the proper name for wildcard support, as in a search for "xapia*". I don't believe Xapian supports wildcards on both sides of a term, correct? Is this something that is technically unfeasable, unpalatable
2007 Jun 05
7
Chinese, Japanese, Korean Tokenizer.
Hi, I am looking for Chinese Japanese and Korean tokenizer that could can be use to tokenize terms for CJK languages. I am not very familiar with these languages however I think that these languages contains one or more words in one symbol which it make more difficult to tokenize into searchable terms. Lucene has CJK Tokenizer ... and I am looking around if there is some open source that we