Displaying 5 results from an estimated 5 matches for "trigram".
2009 Jan 22
4
text vector clustering
Hi,
I am a new user of R using R 2.8.1 in windows 2003. I have a csv file with
single column which contain the 30,000 students names. There were typo
errors while entering this student names. The actual list of names is <
1000. However we dont have that list for keyword search.
I am interested in grouping/cluster these names as those which are
similar letter to letter. Are there any
2014 Mar 03
2
Project: Weighting Schemes
....
I have gone through your webpage thoroughly and I am very interested in the
work that you are undertaking on *Project: Weighting Schemes.*. I earnestly
wish to work under your guidance, learn and progress through this
experience.
I have some experience in Language Modeling
(github.com/reetesh11/trigram)and would love to continue it with
Information Retrieval. But, the problem
is , I don't have experience in Information Retrieval, will that make a
huge difference.?
Kind Regards,
Reetesh Ranjan
Reetesh Ranjan
Junior Undergraduate
IIT(BHU), Varanasi
contact no: +917275115929
Skype : reetesh.ra...
2007 Mar 28
2
Moving indextext.cc into core.
...- the stemming algorithms.
- stopwording algorithms.
- date parsing and term generation.
- standard match deciders for doing things like value range
restrictions, or sort comparison functions.
- automatic language detection code.
- fuzzy matching code (eg, metaphone implementations, trigram matching
implementations).
- spelling correction algorithms.
I'm don't think we'd necessarily a new top-level module for this code;
doing so would make the separation more obvious, but would require a bit
more work than just fiddling with the build system in the xapian-core
mo...
2005 Oct 08
1
*wildcard* support?
Hello,
First I wanted to say thanks for a great piece of software, thanks
Olly and others who've contributed!
I know that Xapian supports right-truncating, if that's the proper
name for wildcard support, as in a search for "xapia*".
I don't believe Xapian supports wildcards on both sides of a term, correct?
Is this something that is technically unfeasable, unpalatable
2007 Jun 05
7
Chinese, Japanese, Korean Tokenizer.
Hi,
I am looking for Chinese Japanese and Korean tokenizer that could can
be use to tokenize terms for CJK languages. I am not very familiar
with these languages however I think that these languages contains one
or more words in one symbol which it make more difficult to tokenize
into searchable terms.
Lucene has CJK Tokenizer ... and I am looking around if there is some
open source that we