thr3ads.net - similar to: "Can not use custom weight scheme with python binding"

Displaying 20 results from an estimated 500 matches similar to: "Can not use custom weight scheme with python binding"

2007 Mar 21

scoring question

Hi All I have just realized that if I set a query like 'green jelly bean' xapian will turn that query into 'green OR jelly OR bean' This causes documents containing just one of the words to be considered a 100% hit. The behavior I would like to see is that each word gives a 33.3% hit, so that a document containing all 3 words gets placed above a document with only 1 or 2

Implementing tf-idf weighting scheme in Xapian

2013 Feb 19

Implementing tf-idf weighting scheme in Xapian

Hello guys.I just read up about tf-idf schemes and want to implement it in Xapian (with some frequently used normalizations) as it will also give me a good hang of implementing a weighting scheme before I start working on implementing DFR schemes. I read the following as references and I think Ive understood it well and can write the hack :- 1.)

Error while compacting: Bad position key

2018 Jul 12

Error while compacting: Bad position key

Mike Hommey <mh at glandium.org> writes: > Hi, > > When running `notmuch compact` today, it stopped with the following > output: > > Compacting database... > compacting table postlist > Reduced by 25% 648656K (2498904K -> 1850248K) > compacting table docdata > Reduced by 15% 24K (152K -> 128K) > compacting table termlist > Reduced by

weight scheme with document values

2006 Jul 25

weight scheme with document values

Hi guys, I resently used xapian to sort some documents by distance between 2 points. I implemented a MatchDecider which work well. I now tried to implement a Weight scheme to put my document in ascending order depending on the distance... My information to calcul distance is in values in the document. How I can access document values from Weight to be able to add some sum_extra weight ??

some trouble when devising skiplist

2014 May 10

some trouble when devising skiplist

Hi, I was confronted with some trouble, I describe the trouble in my journal http://trac.xapian.org/wiki/GSoC2014/Posting%20list%20encoding%20improvements/Journal#May10 And corresponding code is in my git. Would you like to give me some help? ------------------ Shangtong Zhang,Second Year Undergraduate, School of Computer Science, Fudan University, China. -------------- next part

Introduction and Doubts

2016 Mar 10

Introduction and Doubts

Tf-idf is most used used weighting scheme is easy to understand and has been used in other frameworks like lucene and many other places. okapi bm25(implemented in xapian) is theoretically better/improved measure than tf-idf and i am looking into various other weighting scheme which are there in xapian or can be implemented like TF-ICF(term frequecy inverse corpus frequency),TF-RF(term

Omega: Missing support for newer weighting schemes

2017 Apr 08

Omega: Missing support for newer weighting schemes

On Sat, Apr 08, 2017 at 09:11:22PM +0100, James Aylett wrote: > On 8 Apr 2017, at 19:15, Vivek Pal <vivekpal.dtu at gmail.com> wrote: > > >> and the details of which weighting schemes were available in which version > >> isn't a key part of the $set command itself. > > > > Do you suggest dropping that piece of information out? Since the reason behind

Is it possible to reset the parameters in BM25 each time a new query enters?

2011 Feb 18

Is it possible to reset the parameters in BM25 each time a new query enters?

Hi guys, I'm trying to improve the search results of our collection by tuning the parameters in the BM25 weighting schema. Since our collection includes several databases, such as for pictures, websites, etc., I would like to use different values of the same schema to calculate the weights. Yet, rebuilding each time after the change was done to the head file seems not an optimal approach and

Weighting the author of a doc when that term can also appear as a frequent term in other docs

2017 Sep 28

Weighting the author of a doc when that term can also appear as a frequent term in other docs

We have a corpus of academic papers. Sometimes it happens that there is an academic controversy and one paper is a response or rebuttal to another paper. The name of the author of the first paper may appear many times in the second paper. So in light of this, how should we set our weight on the author field? Here is an example: http://www.nber.org/papers/w11215 in which the term

Omega: Missing support for newer weighting schemes

2017 Apr 09

Omega: Missing support for newer weighting schemes

On Sun, Apr 09, 2017 at 11:34:07PM +0530, Vivek Pal wrote: > > Each scheme already has a human-readable name, and Xapian::Registry > > can map that to an "examplar" object of the right type, so we > > could take a string like "bm25 1 0.8", see the first word is "bm25" > > and get a BM25Weight object, then call parse_params("1 0.8") on

[GSOC 2014] Indexing INEX dataset

2014 Mar 22

[GSOC 2014] Indexing INEX dataset

For unsupervised approaches like BM25 this approach works well but letor does not need special weighting for title in this form as it itself assigns weights to title features separately. But I see your concern it would be a problem when BM25 is used on the index with this setup. Hence its preferable to take a note of this uplift in title weight for xapian-letor and normalize it everywhere

GSoc Project Idea Weighting Schemes (Ranking)

2014 Nov 23

GSoc Project Idea Weighting Schemes (Ranking)

Hi, I am Abhishek Currently Xapian::Weight follows BM25 scheme, many models such as the Divergence from Randomness (DfR) family of models, Unigram Language Model and the Bi-gram Language Model implemented two years ago in GSoc 2012 yet not merged to the master. The new weighing schemes or improvement in implementing the previous models to change the default scheme of BM25 from SMART with

Backend for Lucene format indexes-How to get doclength

2013 Aug 26

Backend for Lucene format indexes-How to get doclength

On Mon, Aug 26, 2013 at 09:41:07AM +0800, jiangwen jiang wrote: > > For now, using weighting schemes which don't use document length is > > probably the simplest answer. > > There's tf-idf weighting scheme on svn master, is it suitable for lucene > backend? Yes - TfIdfWeight doesn't ever use the document length (at least with the normalisations currently

Test Dataset for performance and accuracy analysis

2014 Mar 04

Test Dataset for performance and accuracy analysis

Hi Parth, I implemented DFR algorithms in Xapian as a part of GSOC last year under the mentorship of Olly. This year, I want to work on analyzing and optimizing the performance of the DFR algorithms and comparing them with BM25.I also want to work on profiling the query expansion schemes and test the relevance(precision and recall) / speed(time taken) of the

Weighting Schemes: Evaluation results

2016 Jul 24

Weighting Schemes: Evaluation results

Hi all, I have evaluated new weighting schemes along with their existing counterparts in Xapian to compare and see which one does better job. Also, I have put together all the results files for easy access here: https://github.com/ivmarkp/xapian-evaluation/tree/evaluation/run and a README for getting started with xapian-evaluation module. Hopefully, it might be of help to those who are new to

Project: Posting list encoding improvements

2012 Mar 31

Project: Posting list encoding improvements

Hi Xapianers: My name is Weixian Zhou, Computer Science student of University at Buffalo, State University of New York. I am interested in the project of posting list encoding improvements and weighting schemes. I have some questions toward them. 1) After read the comments in brass_postlist.cc, I am still not very clear about the detailed structure of postings list. If you can provide some simple

query time stemming and term weights

2005 Nov 16

query time stemming and term weights

I am developping a personal/desktop search tool for which I am experimenting with doing no stemming during the indexing, but instead having a stem database (or several for different languages), used for expanding the query terms at search time. (ie: user query: flooring -> stem: floor -> final query for: [floored flooring floorings floors]) I have thought of a possible problem with

Project Proposal in GSoC 2019

2019 Mar 19

Project Proposal in GSoC 2019

Hi All, I am interested in applying for the two projects listed in the Xapian Gsoc 2019 project idealist: "Learning to Rank Stabilisation" and "Weighting Schemes". I have downloaded the codebase and going through some of the commits related to Letor API, BM25, and DFR weighting schemes. Can anyone tell me how to write about the formal proposal for the above-mentioned projects?

Backend for Lucene format indexes-How to get doclength

2013 Sep 02

Backend for Lucene format indexes-How to get doclength

On Mon, Sep 02, 2013 at 09:21:48AM +0800, jiangwen jiang wrote: > TfIdfWeight and BM25(b=0) also need wdf_upper_bound, it is not exists in > Lucene backends. If you don't provide an implementation of wdf_upper_bound(), the default is to use the collection frequency of the term, so provided that information is available in the lucene files, the lack of wdf_upper_bound information

Weighting Schemes -- Project Progress

2016 Jun 10

Weighting Schemes -- Project Progress

Hello everyone, I have been working on adding support for BM25+ weighting function from the last couple of weeks. Initially, I considered modifying bm25weight.cc to add support for BM25+ function without disturbing functionalities of BM25. But that didn't work out very well. A day or two was spent trying to refactor and debug the same code. Later, I took another approach following the

similar to: Can not use custom weight scheme with python binding