Displaying 20 results from an estimated 500 matches similar to: "Can not use custom weight scheme with python binding"
2007 Mar 21
1
scoring question
Hi All
I have just realized that if I set a query like
'green jelly bean'
xapian will turn that query into
'green OR jelly OR bean'
This causes documents containing just one of the words to be considered
a 100% hit.
The behavior I would like to see is that each word gives a 33.3% hit, so
that a document containing all 3 words gets placed above a document with
only 1 or 2
2013 Feb 19
2
Implementing tf-idf weighting scheme in Xapian
Hello guys.I just read up about tf-idf schemes and want to implement it in
Xapian (with some frequently used normalizations) as it will also give me a
good hang of implementing a weighting scheme before I start working on
implementing DFR schemes.
I read the following as references and I think Ive understood it well and
can write the hack :-
1.)
2018 Jul 12
1
Error while compacting: Bad position key
Mike Hommey <mh at glandium.org> writes:
> Hi,
>
> When running `notmuch compact` today, it stopped with the following
> output:
>
> Compacting database...
> compacting table postlist
> Reduced by 25% 648656K (2498904K -> 1850248K)
> compacting table docdata
> Reduced by 15% 24K (152K -> 128K)
> compacting table termlist
> Reduced by
2006 Jul 25
2
weight scheme with document values
Hi guys,
I resently used xapian to sort some documents by distance between 2
points.
I implemented a MatchDecider which work well.
I now tried to implement a Weight scheme to put my document in ascending
order depending on the distance...
My information to calcul distance is in values in the document.
How I can access document values from Weight to be able to add some
sum_extra weight ??
2014 May 10
2
some trouble when devising skiplist
Hi,
I was confronted with some trouble, I describe the trouble in my journal
http://trac.xapian.org/wiki/GSoC2014/Posting%20list%20encoding%20improvements/Journal#May10
And corresponding code is in my git.
Would you like to give me some help?
------------------
Shangtong Zhang,Second Year Undergraduate,
School of Computer Science,
Fudan University, China.
-------------- next part
2016 Mar 10
2
Introduction and Doubts
Tf-idf is most used used weighting scheme is easy to understand and has
been used in other frameworks like lucene and many other places.
okapi bm25(implemented in xapian) is theoretically better/improved measure
than tf-idf and
i am looking into various other weighting scheme which are there in xapian
or can be implemented like TF-ICF(term frequecy inverse corpus
frequency),TF-RF(term
2017 Apr 08
2
Omega: Missing support for newer weighting schemes
On Sat, Apr 08, 2017 at 09:11:22PM +0100, James Aylett wrote:
> On 8 Apr 2017, at 19:15, Vivek Pal <vivekpal.dtu at gmail.com> wrote:
>
> >> and the details of which weighting schemes were available in which version
> >> isn't a key part of the $set command itself.
> >
> > Do you suggest dropping that piece of information out? Since the reason behind
2011 Feb 18
1
Is it possible to reset the parameters in BM25 each time a new query enters?
Hi guys,
I'm trying to improve the search results of our collection by tuning the parameters in the BM25 weighting schema. Since our collection includes several databases, such as for pictures, websites, etc., I would like to use different values of the same schema to calculate the weights. Yet, rebuilding each time after the change was done to the head file seems not an optimal approach and
2017 Sep 28
1
Weighting the author of a doc when that term can also appear as a frequent term in other docs
We have a corpus of academic papers. Sometimes it happens that there is
an academic controversy and one paper is a response or rebuttal to
another paper. The name of the author of the first paper may appear many
times in the second paper. So in light of this, how should we set our
weight on the author field?
Here is an example:
http://www.nber.org/papers/w11215
in which the term
2017 Apr 09
3
Omega: Missing support for newer weighting schemes
On Sun, Apr 09, 2017 at 11:34:07PM +0530, Vivek Pal wrote:
> > Each scheme already has a human-readable name, and Xapian::Registry
> > can map that to an "examplar" object of the right type, so we
> > could take a string like "bm25 1 0.8", see the first word is "bm25"
> > and get a BM25Weight object, then call parse_params("1 0.8") on
2014 Mar 22
2
[GSOC 2014] Indexing INEX dataset
For unsupervised approaches like BM25 this approach works well but letor
does not need special weighting for title in this form as it itself assigns
weights to title features separately.
But I see your concern it would be a problem when BM25 is used on the index
with this setup. Hence its preferable to take a note of this uplift in
title weight for xapian-letor and normalize it everywhere
2014 Nov 23
2
GSoc Project Idea Weighting Schemes (Ranking)
Hi,
I am Abhishek
Currently Xapian::Weight follows BM25 scheme, many models such as the
Divergence from Randomness (DfR) family of models, Unigram Language Model
and the Bi-gram Language Model implemented two years ago in GSoc 2012 yet
not merged to the master.
The new weighing schemes or improvement in implementing the previous models
to change the default scheme of BM25 from SMART with
2013 Aug 26
2
Backend for Lucene format indexes-How to get doclength
On Mon, Aug 26, 2013 at 09:41:07AM +0800, jiangwen jiang wrote:
> > For now, using weighting schemes which don't use document length is
> > probably the simplest answer.
>
> There's tf-idf weighting scheme on svn master, is it suitable for lucene
> backend?
Yes - TfIdfWeight doesn't ever use the document length (at least with
the normalisations currently
2014 Mar 04
2
Test Dataset for performance and accuracy analysis
Hi Parth,
I implemented DFR algorithms in Xapian as
a part of GSOC last year under the mentorship of Olly. This year, I want to
work on analyzing and optimizing the performance of the DFR algorithms and
comparing them with BM25.I also want to work on profiling the query
expansion schemes and test the relevance(precision and recall) / speed(time
taken) of the
2016 Jul 24
2
Weighting Schemes: Evaluation results
Hi all,
I have evaluated new weighting schemes along with their existing
counterparts in Xapian to compare and see which one does better job.
Also, I have put together all the results files for easy access here:
https://github.com/ivmarkp/xapian-evaluation/tree/evaluation/run
and a README for getting started with xapian-evaluation module. Hopefully,
it might be of help to those who are new to
2012 Mar 31
1
Project: Posting list encoding improvements
Hi Xapianers:
My name is Weixian Zhou, Computer Science student of University at Buffalo,
State University of New York. I am interested in the project of posting
list encoding improvements and weighting schemes. I have some questions
toward them.
1) After read the comments in brass_postlist.cc, I am still not very clear
about the detailed structure of postings list. If you can provide some
simple
2005 Nov 16
1
query time stemming and term weights
I am developping a personal/desktop search tool for which I am
experimenting with doing no stemming during the indexing, but instead
having a stem database (or several for different languages), used for
expanding the query terms at search time.
(ie: user query: flooring -> stem: floor
-> final query for: [floored flooring floorings floors])
I have thought of a possible problem with
2019 Mar 19
3
Project Proposal in GSoC 2019
Hi All,
I am interested in applying for the two projects listed in the Xapian Gsoc
2019 project idealist: "Learning to Rank Stabilisation" and "Weighting
Schemes". I have downloaded the codebase and going through some of the
commits related to Letor API, BM25, and DFR weighting schemes. Can anyone
tell me how to write about the formal proposal for the above-mentioned
projects?
2013 Sep 02
2
Backend for Lucene format indexes-How to get doclength
On Mon, Sep 02, 2013 at 09:21:48AM +0800, jiangwen jiang wrote:
> TfIdfWeight and BM25(b=0) also need wdf_upper_bound, it is not exists in
> Lucene backends.
If you don't provide an implementation of wdf_upper_bound(), the default
is to use the collection frequency of the term, so provided that
information is available in the lucene files, the lack of
wdf_upper_bound information
2016 Jun 10
2
Weighting Schemes -- Project Progress
Hello everyone,
I have been working on adding support for BM25+ weighting function from the
last couple of weeks. Initially, I considered modifying bm25weight.cc to
add support for BM25+ function without disturbing functionalities of BM25.
But that didn't work out very well. A day or two was spent trying to
refactor and debug the same code.
Later, I took another approach following the