thr3ads.net - search: "trigram"

Displaying 5 results from an estimated 5 matches for "trigram".

2009 Jan 22

text vector clustering

Hi, I am a new user of R using R 2.8.1 in windows 2003. I have a csv file with single column which contain the 30,000 students names. There were typo errors while entering this student names. The actual list of names is < 1000. However we dont have that list for keyword search. I am interested in grouping/cluster these names as those which are similar letter to letter. Are there any

Project: Weighting Schemes

2014 Mar 03

Project: Weighting Schemes

.... I have gone through your webpage thoroughly and I am very interested in the work that you are undertaking on *Project: Weighting Schemes.*. I earnestly wish to work under your guidance, learn and progress through this experience. I have some experience in Language Modeling (github.com/reetesh11/trigram)and would love to continue it with Information Retrieval. But, the problem is , I don't have experience in Information Retrieval, will that make a huge difference.? Kind Regards, Reetesh Ranjan Reetesh Ranjan Junior Undergraduate IIT(BHU), Varanasi contact no: +917275115929 Skype : reetesh.ra...

Moving indextext.cc into core.

2007 Mar 28

Moving indextext.cc into core.

...- the stemming algorithms. - stopwording algorithms. - date parsing and term generation. - standard match deciders for doing things like value range restrictions, or sort comparison functions. - automatic language detection code. - fuzzy matching code (eg, metaphone implementations, trigram matching implementations). - spelling correction algorithms. I'm don't think we'd necessarily a new top-level module for this code; doing so would make the separation more obvious, but would require a bit more work than just fiddling with the build system in the xapian-core mo...

*wildcard* support?

2005 Oct 08

*wildcard* support?

Hello, First I wanted to say thanks for a great piece of software, thanks Olly and others who've contributed! I know that Xapian supports right-truncating, if that's the proper name for wildcard support, as in a search for "xapia*". I don't believe Xapian supports wildcards on both sides of a term, correct? Is this something that is technically unfeasable, unpalatable

Chinese, Japanese, Korean Tokenizer.

2007 Jun 05

Chinese, Japanese, Korean Tokenizer.

Hi, I am looking for Chinese Japanese and Korean tokenizer that could can be use to tokenize terms for CJK languages. I am not very familiar with these languages however I think that these languages contains one or more words in one symbol which it make more difficult to tokenize into searchable terms. Lucene has CJK Tokenizer ... and I am looking around if there is some open source that we

search for: trigram