Displaying 12 results from an estimated 12 matches for "bigram".
Did you mean:
bigrat
2012 Jun 29
0
Adding Bi-gram in the QueryParser and Object.
...y Object:
*Near - *2 or more terms with near in between.It is a type of query these
two term are in window of 10 words.Since we are seeking these two words in
vicinity of 10 Words window.It wont hurt to have them as bi-grams as we are
seeking them in 10 words window so having them next is better.*(Bigram can
be added)*
*Example:*
*
*
Failed NEAR Assertion
*Currently parser output.*
Query((failed at 1 NEAR 11 assertion at 2))
*Output With Bigram:*
*
*
Query((failed at 1 NEAR 11 assertion at 2) OR failed assertion at 3)
*Implementation:*
Since the all terms detected as near is added to class *T...
2002 Nov 17
1
SVD for reducing dimensions
...uld be ok),
which I will call a feature vector. The the distance between two words
represents the similarity of the contexts of the words, so big and little
have very similar contexts and should get a similar representation. Basically
to build something similar to a thesaurus.
I have computed bigram counts between the n most common words, for varying
values of n between 500 and 5000. These are saved to a file which I can load
with read.table. This matrix is symmetric and far from sparse, although I
can adjust the sparseness by changing the bigram window. First question,
should I scale th...
2011 Jan 15
2
[LLVMdev] Spell Correction Efficiency
...ch was used in CLang, notably for auto-completion, based on the
>> Levenstein distance.
>>
>> It turns out I just came upon this link today:
>> http://nlp.stanford.edu/IR-book/html/htmledition/k-gram-indexes-for-spelling-correction-1.html
>>
>> The idea is to use bigrams (2-letters parts of a word) to build an index
>> of the form (bigram > all words containing this bigram), then use this index
>> to retrieve all the words with enough bigrams in common with the word you
>> are currently trying to approximate.
>>
>> This drastically...
2016 Mar 04
2
GSOC 2016 project on Ranking
Hello Sir,
I am a third-year student at the Department of mathematics at IIT
Kharagpur. I have good experience in Information Retrieval and Machine
Learning. I have read many chapters of the book Introduction to Information
Retrieval. Recently I am doing a project on tagging a question on a Q&A
Forum using ranking the tags and probabilistic inference. I also have
software development
2012 Jun 03
0
Proposal for Integration of Bi-gram in Xapian Architecture
Hi,
I have made a proposal for changes to integrate bi-grams in Xapian
Architecture on Wiki page.
Bigram Integration Proposal:
http://trac.xapian.org/wiki/GSoC2012/Bi-gram%20Language%20Modeling/Bi-gram%20Integration%20Proposal
Since Bi-gram integration will make some difference in how data is accessed
from the back-end so its better to get review from whole comunity.Moreover
i also have some doubts w...
2016 Apr 12
0
Xapian 1.3.5 snapshot performance and index size
...te:
> This way, "to be or not to be" gets from 11 S to 0.6 S, and "to be of
> the" gets from 12 S to 0.9 S. Which is of course brilliant !
>
> I think that I can dump my plan of indexing compound terms for runs of
> common words :)
We had been experimenting with bigrams to accelerate phrases, and not
having to go that route was one motivation for the key order change.
The bigram terms would add significantly to DB size, and to cache
pressure.
> > I'm not sure there's an easy solution to the position table not coming
> > out compact in this...
2010 Jan 18
4
Index indexed words
Hello,
We would like to create Google or Firefox like "search hints".
If someone types "abc", the search system should name
some possible hints.
I think, Firefox does it by indexing 3-characters of the domain
name. If you enter parts, you get some hints.
Thank you very much
Marcus
2017 Mar 05
3
GSoc 2017 Introduction(Weighting Schemes)
...successfully compiled Xapian
from source and have implemented some examples. While going through the
project list Weighting Schemes project is the one I was looking to
contribute to. So i went through the xapian-core/weight where most of the
schemes are already present and I also went through the Bigram-model which
was outside the tree and not merged yet.
So can Anyone of please give a pointer to which weighting schemes are not
implemented yet so that I can start looking at it.
Regards,
Prachi Prakash
Final year Graduate Student
LinkedIn: https://www.linkedin.com/in/prachi-prakash-7b674351/
gith...
2005 Oct 08
1
*wildcard* support?
Hello,
First I wanted to say thanks for a great piece of software, thanks
Olly and others who've contributed!
I know that Xapian supports right-truncating, if that's the proper
name for wildcard support, as in a search for "xapia*".
I don't believe Xapian supports wildcards on both sides of a term, correct?
Is this something that is technically unfeasable, unpalatable
2007 Jun 05
7
Chinese, Japanese, Korean Tokenizer.
Hi,
I am looking for Chinese Japanese and Korean tokenizer that could can
be use to tokenize terms for CJK languages. I am not very familiar
with these languages however I think that these languages contains one
or more words in one symbol which it make more difficult to tokenize
into searchable terms.
Lucene has CJK Tokenizer ... and I am looking around if there is some
open source that we
2016 Apr 11
2
Xapian 1.3.5 snapshot performance and index size
Olly Betts writes:
> On Sun, Apr 10, 2016 at 04:47:01PM +0200, Jean-Francois Dockes wrote:
> > Some might notice the 50% index size increase. Excessive index size is
> > already one relatively rare, but recurring complaint. Except if I did
> > something wrong: I'm actually quite surprised by it.
>
> Did you try compacting the resulting databases?
>
>
2009 Aug 13
1
using package tm to find phrases
I am using the package "tm" for text-mining of abstracts and would like to
use it to find instances of gene names that may contain white space. For
instance "gene regulatory protein 1". The default behavior of tm is to parse
this into 4 separate words, but I would like to use the class constructor
"dictionary" to define phrases such as just mentioned.
Is this