search for: bigrams

Displaying 12 results from an estimated 12 matches for "bigrams".

Did you mean: bigram
2012 Jun 29
0
Adding Bi-gram in the QueryParser and Object.
...Query((failed at 1 NEAR 11 assertion at 2) OR failed assertion at 3) *Implementation:* Since the all terms detected as near is added to class *Terms* so when we ask for Queries from the Class *Terms *using as_near_query , as_adj_query,as_opwindow_query then while parsing terms we can just add the bigrams while iterating list of term. *Adj: *exactly similar to *NEAR(Bigram can be added)* *phrase : *Terms giving in a Quotes.Since they are terms user want to have together.Bigram can be added*(Bigram can be added)* Implementation is similar to Near,adj. * * *Phrased: *Single term which is actually t...
2002 Nov 17
1
SVD for reducing dimensions
...n La.svd(x) : argument to La.svd must be numeric or complex > xs <- svd(x) > ncol(xs$v) [1] 500 > nrow(xs$v) [1] 500 > nrow(xs$u) [1] 500 > ncol(xs$u) [1] 500 Also, how should I locate the million or so less common words into the space generated by this? Running svd on the full bigrams sounds infeasable, it would be a 200GB matrix, for a start. Really I just want to 'predict' their location rather than build the classifier with a larger set. Thank you for your time Corrin Lakeland -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (GNU/Linux) iD8DBQE92AJOi5A0ZsG8x8c...
2011 Jan 15
2
[LLVMdev] Spell Correction Efficiency
...ch was used in CLang, notably for auto-completion, based on the >> Levenstein distance. >> >> It turns out I just came upon this link today: >> http://nlp.stanford.edu/IR-book/html/htmledition/k-gram-indexes-for-spelling-correction-1.html >> >> The idea is to use bigrams (2-letters parts of a word) to build an index >> of the form (bigram > all words containing this bigram), then use this index >> to retrieve all the words with enough bigrams in common with the word you >> are currently trying to approximate. >> >> This drastically...
2016 Mar 04
2
GSOC 2016 project on Ranking
Hello Sir, I am a third-year student at the Department of mathematics at IIT Kharagpur. I have good experience in Information Retrieval and Machine Learning. I have read many chapters of the book Introduction to Information Retrieval. Recently I am doing a project on tagging a question on a Q&A Forum using ranking the tags and probabilistic inference. I also have software development
2012 Jun 03
0
Proposal for Integration of Bi-gram in Xapian Architecture
Hi, I have made a proposal for changes to integrate bi-grams in Xapian Architecture on Wiki page. Bigram Integration Proposal: http://trac.xapian.org/wiki/GSoC2012/Bi-gram%20Language%20Modeling/Bi-gram%20Integration%20Proposal Since Bi-gram integration will make some difference in how data is accessed from the back-end so its better to get review from whole comunity.Moreover i also have some
2016 Apr 12
0
Xapian 1.3.5 snapshot performance and index size
...te: > This way, "to be or not to be" gets from 11 S to 0.6 S, and "to be of > the" gets from 12 S to 0.9 S. Which is of course brilliant ! > > I think that I can dump my plan of indexing compound terms for runs of > common words :) We had been experimenting with bigrams to accelerate phrases, and not having to go that route was one motivation for the key order change. The bigram terms would add significantly to DB size, and to cache pressure. > > I'm not sure there's an easy solution to the position table not coming > > out compact in this c...
2010 Jan 18
4
Index indexed words
Hello, We would like to create Google or Firefox like "search hints". If someone types "abc", the search system should name some possible hints. I think, Firefox does it by indexing 3-characters of the domain name. If you enter parts, you get some hints. Thank you very much Marcus
2017 Mar 05
3
GSoc 2017 Introduction(Weighting Schemes)
Hello Everyone, I am a second year graduate student at IIIT-Bangalore and my interest is in the field of Information Retrieval. I have successfully compiled Xapian from source and have implemented some examples. While going through the project list Weighting Schemes project is the one I was looking to contribute to. So i went through the xapian-core/weight where most of the schemes are already
2005 Oct 08
1
*wildcard* support?
Hello, First I wanted to say thanks for a great piece of software, thanks Olly and others who've contributed! I know that Xapian supports right-truncating, if that's the proper name for wildcard support, as in a search for "xapia*". I don't believe Xapian supports wildcards on both sides of a term, correct? Is this something that is technically unfeasable, unpalatable
2007 Jun 05
7
Chinese, Japanese, Korean Tokenizer.
Hi, I am looking for Chinese Japanese and Korean tokenizer that could can be use to tokenize terms for CJK languages. I am not very familiar with these languages however I think that these languages contains one or more words in one symbol which it make more difficult to tokenize into searchable terms. Lucene has CJK Tokenizer ... and I am looking around if there is some open source that we
2016 Apr 11
2
Xapian 1.3.5 snapshot performance and index size
Olly Betts writes: > On Sun, Apr 10, 2016 at 04:47:01PM +0200, Jean-Francois Dockes wrote: > > Some might notice the 50% index size increase. Excessive index size is > > already one relatively rare, but recurring complaint. Except if I did > > something wrong: I'm actually quite surprised by it. > > Did you try compacting the resulting databases? > >
2009 Aug 13
1
using package tm to find phrases
I am using the package "tm" for text-mining of abstracts and would like to use it to find instances of gene names that may contain white space. For instance "gene regulatory protein 1". The default behavior of tm is to parse this into 4 separate words, but I would like to use the class constructor "dictionary" to define phrases such as just mentioned. Is this