Displaying 20 results from an estimated 6000 matches similar to: "Participation in GSOC"
2013 Feb 19
2
Implementing tf-idf weighting scheme in Xapian
Hello guys.I just read up about tf-idf schemes and want to implement it in
Xapian (with some frequently used normalizations) as it will also give me a
good hang of implementing a weighting scheme before I start working on
implementing DFR schemes.
I read the following as references and I think Ive understood it well and
can write the hack :-
1.)
2013 Mar 11
1
Implementation of the PL2 weighting scheme of the DFR Framework
Hello guys.I am working on implementing the PL2 weighting scheme of the DFR
framework by Gianni Amati.
It uses the Poisson approximation of the Binomial as the probabilistic
model (P), the Laplace law of succession to calculate the after effect of
sampling or the risk gain (L) and within document frequency normalization
H2(2) (as proposed by Amati in his PHD thesis).
The formula for w(t,d) in
2016 May 16
2
Weighting recent results
I was thinking about this some more: Is there a reason I can't just
weight by some function of recency at indexing time?
$weight = get_weight_based_on_recency(...);
$tg->index_text($txt,$weight);
If I wanted to allow the user the option of searching either in
recency-weighted mode or not, I could index each document into 2
different databases, one with and one without.
This avoids
2007 Mar 21
1
scoring question
Hi All
I have just realized that if I set a query like
'green jelly bean'
xapian will turn that query into
'green OR jelly OR bean'
This causes documents containing just one of the words to be considered
a 100% hit.
The behavior I would like to see is that each word gives a 33.3% hit, so
that a document containing all 3 words gets placed above a document with
only 1 or 2
2017 Mar 05
3
GSoc 2017 Introduction(Weighting Schemes)
Hello Everyone,
I am a second year graduate student at IIIT-Bangalore and my interest is in
the field of Information Retrieval. I have successfully compiled Xapian
from source and have implemented some examples. While going through the
project list Weighting Schemes project is the one I was looking to
contribute to. So i went through the xapian-core/weight where most of the
schemes are already
2014 Mar 22
2
[GSOC 2014] Indexing INEX dataset
For unsupervised approaches like BM25 this approach works well but letor
does not need special weighting for title in this form as it itself assigns
weights to title features separately.
But I see your concern it would be a problem when BM25 is used on the index
with this setup. Hence its preferable to take a note of this uplift in
title weight for xapian-letor and normalize it everywhere
2013 Aug 25
2
Backend for Lucene format indexes-How to get doclength
On Tue, Aug 20, 2013 at 07:28:42PM +0800, jiangwen jiang wrote:
> I think norm(t, d) in Lucene can used to caculate the number which is
> similar to doc length(see norm(t,d) in
> http://lucene.apache.org/core/3_5_0/api/all/org/apache/lucene/search/Similarity.html#formula_norm).
It sounds similar (especially if document and field boosts aren't in use),
though some places may rely on
2013 Apr 01
1
Doubt about GSOC proposal
Hello guys.I have begun work on writing my proposal as discussed on IRC and
will submit a draft in a couple of days so that I can make it detailed and
refine it after getting feedback.
I wanted to know about the number of weeks a proposal should cover and
also,is it okay if I set aside a buffer week somewhere in the middle of the
summer for something like cleaning the code,working on the
2012 Mar 31
1
Project: Posting list encoding improvements
Hi Xapianers:
My name is Weixian Zhou, Computer Science student of University at Buffalo,
State University of New York. I am interested in the project of posting
list encoding improvements and weighting schemes. I have some questions
toward them.
1) After read the comments in brass_postlist.cc, I am still not very clear
about the detailed structure of postings list. If you can provide some
simple
2007 Apr 30
1
Xapian document matching
Hi, i'm wondering is there a possibility to do like ABCSok do
(http://nyheter.abcsok.no/), to make "Main article" and "Same articles"
collapsed to it.
Like on http://news.google.com/?hl=en the same thing. "Parent" and "same
article on other sites" (they do differ from each other a little bit).
Maybe somebody know how to do that thing or where to read
2012 Dec 08
2
Want to contribute code to the Xapian project
Hey guys,I am a 3rd year Computer Science undergrad student.I a extremely
interested in contributing code to the XAPIAN project. The work you people
do sounds extremely fascinating and interesting.Can someone just give me a
brief overview of how to proceed ?. I Can code in C,C++ and Python and
have experience in Natural Lanuage Processing.Am also quite comfortable
with NLTK and using Wordnet.Am
2014 Apr 13
2
Adding an external library to Xapian
My code is not on Github. I am using the tarball as of now. The following
it the error that occurred:
http://pastebin.com/cVJrjUZX
On Sun, Apr 13, 2014 at 8:16 PM, James Aylett <james-xapian at tartarus.org>wrote:
> On 13 Apr 2014, at 15:37, Pallavi Gudipati <pallavigudipati at gmail.com>
> wrote:
>
> > A linker error is encountered even after following the above
2013 Mar 08
2
Gsoc-2013
Hi,
I am Chinmay Naik, an undergraduate in Computer Science at Bangalore
Institute of Technology, Bangalore.
I am an experienced programmer and good with C,C++,Python,Java,OpenGL and
would love to participate in Gsoc-13.
>From the ideas listed, i am interested to work on the project "posting list
encoding improvements".
I am a newbie to Xapian but would like to get involved and get a
2010 Jan 18
3
postlist: Tag containing meta information is corrupt.
Greetings,
Using latest svn.
I've noticed the following error when performing index merging:
postlist:
baseB blocksize=8K items=33962 lastblock=534 revision=1 levels=2 root=459
B-tree checked okay
Tag containing meta information is corrupt.
postlist table errors found: 1
I can still search on this index (I've only checked very small indexes),
but merging is now a problem since I check
2017 May 22
2
Xapian 1.4.3 "Db block overwritten - are there multiple writers?"
Olly Betts writes:
> On Wed, May 17, 2017 at 09:08:32PM +0200, Jean-Francois Dockes wrote:
> > I have a user reporting the following error during recoll indexing:
> >
> > flush() failed: Db block overwritten - are there multiple writers?
> >
> > "flush() failed" is from recoll, the rest is, I think the text of the Xapian
> > exception.
2012 Mar 23
1
GSoC Term Weighting project
Hi everyone,
I'm a graduate student in Linguistics and Computer Science in the US, and
I'm planning to propose a project to Xapian for GSoC that would implement
and evaluate a variety of weighting schemes and ranking methods, allowing
users to select different combinations. I have pretty thorough knowledge IR
weighting and ranking, and I'm good in Java and Perl, and functional in
2023 May 03
1
manual flushing thresholds for deletes?
Olly Betts <olly at survex.com> wrote:
> On Mon, Mar 27, 2023 at 11:22:09AM +0000, Eric Wong wrote:
> > Olly Betts <olly at survex.com> wrote:
> > > 10 seems too long. You want the mean word length weighted by frequency
> > > of occurrence. For English that's typically around 5 characters, which
> > > is 5 bytes. If we go for +1 that's:
2016 May 03
2
Fwd: R bindings for Xapian: API modifications
>
> >but it looked like
> >you were suggesting that (for instance) the ID column in the data
> >frame would only be specified by numeric index.
>
The parameter idField is only used to allow the user to specify a column
whose row values will be used as unique identifiers. If it's required to
index the idField then it should be separately included in the indexFields
list
2017 May 17
2
Xapian 1.4.3 "Db block overwritten - are there multiple writers?"
Hi,
I have a user reporting the following error during recoll indexing:
flush() failed: Db block overwritten - are there multiple writers?
"flush() failed" is from recoll, the rest is, I think the text of the Xapian
exception.
This is with Xapian 1.4.3 on Linux (I asked for more details, should be
coming).
I don't think that I've ever seen this error, and I also
2023 Mar 27
1
manual flushing thresholds for deletes?
On Mon, Mar 27, 2023 at 11:22:09AM +0000, Eric Wong wrote:
> Olly Betts <olly at survex.com> wrote:
> > 10 seems too long. You want the mean word length weighted by frequency
> > of occurrence. For English that's typically around 5 characters, which
> > is 5 bytes. If we go for +1 that's:
>
> Actually, 10 may be too short in my case since there's a