thr3ads.net - similar to: "Lucene ranking"

Displaying 20 results from an estimated 4000 matches similar to: "Lucene ranking"

Backend for Lucene format indexes-How to get doclength

2013 Aug 26

Backend for Lucene format indexes-How to get doclength

On Mon, Aug 26, 2013 at 09:41:07AM +0800, jiangwen jiang wrote: > > For now, using weighting schemes which don't use document length is > > probably the simplest answer. > > There's tf-idf weighting scheme on svn master, is it suitable for lucene > backend? Yes - TfIdfWeight doesn't ever use the document length (at least with the normalisations currently

GSoc Project Idea Weighting Schemes (Ranking)

2014 Nov 23

GSoc Project Idea Weighting Schemes (Ranking)

Hi, I am Abhishek Currently Xapian::Weight follows BM25 scheme, many models such as the Divergence from Randomness (DfR) family of models, Unigram Language Model and the Bi-gram Language Model implemented two years ago in GSoc 2012 yet not merged to the master. The new weighing schemes or improvement in implementing the previous models to change the default scheme of BM25 from SMART with

Backend for Lucene format indexes-How to get doclength

2013 Jun 16

Backend for Lucene format indexes-How to get doclength

Hi, all: I have wrote a demo patch for Backend for Lucene format indexes, Lucene version is 3.6.2. http://lucene.apache.org/core/3_6_2/fileformats.html Now, this demo patch just support the basic features in Lucene. Compound File(.cfs/.cfe)?term vector(.tvx/.tvd/.tvf) delete document(.del) are not supported, skip list in .fdx is not supported too example/quest.cc is used to test this demo.

Backend for Lucene format indexes-How to get doclength

2013 Aug 25

Backend for Lucene format indexes-How to get doclength

On Tue, Aug 20, 2013 at 07:28:42PM +0800, jiangwen jiang wrote: > I think norm(t, d) in Lucene can used to caculate the number which is > similar to doc length(see norm(t,d) in > http://lucene.apache.org/core/3_5_0/api/all/org/apache/lucene/search/Similarity.html#formula_norm). It sounds similar (especially if document and field boosts aren't in use), though some places may rely on

Introduction and Doubts

2016 Mar 10

Introduction and Doubts

Tf-idf is most used used weighting scheme is easy to understand and has been used in other frameworks like lucene and many other places. okapi bm25(implemented in xapian) is theoretically better/improved measure than tf-idf and i am looking into various other weighting scheme which are there in xapian or can be implemented like TF-ICF(term frequecy inverse corpus frequency),TF-RF(term

Backend for Lucene format indexes-How to get doclength

2013 Sep 02

Backend for Lucene format indexes-How to get doclength

On Mon, Sep 02, 2013 at 09:21:48AM +0800, jiangwen jiang wrote: > TfIdfWeight and BM25(b=0) also need wdf_upper_bound, it is not exists in > Lucene backends. If you don't provide an implementation of wdf_upper_bound(), the default is to use the collection frequency of the term, so provided that information is available in the lucene files, the lack of wdf_upper_bound information

Participation in GSOC

2011 Mar 29

Participation in GSOC

Hi, I'm Michael, I would like to participate in this year's Google Summer of Code, and I picked Xapian as the project to code for. Before writing a full proposal, I want to get in contact with the community, as well as introducing myself and discuss my ideas for the contribution to Xapian. First of all I'd like to talk about my motivation. I'm currently working on a webapp

Participation in GSOC

2011 Mar 29

Participation in GSOC

chert vs flint vs lucene

2009 Jan 16

chert vs flint vs lucene

Hi, What's the main difference between chert and flint? What above vs lucene? I am mainly asking about data structure (lexicon, posting list, document data), what's in memory, what's on disk, hash vs b-tree and reasons behind them. Any pointer is appreciated. Thanks! Crystal -------------- next part -------------- An HTML attachment was scrubbed... URL:

New Idea on Ranking in IR

2011 Apr 01

New Idea on Ranking in IR

Hello, I want to discuss my idea on ranking in IR system which I think can be good extension to Xapian. If I am not too late to discuss it then please consider it. I first give you brief background of me, I am a Masters student working on my thesis in the Information Retrieval. I today only got a mail from one of the professor from Europe whom i am going to join for Ph.D about GSoC and more

Omega: Missing support for newer weighting schemes

2017 Apr 08

Omega: Missing support for newer weighting schemes

On Sat, Apr 08, 2017 at 09:11:22PM +0100, James Aylett wrote: > On 8 Apr 2017, at 19:15, Vivek Pal <vivekpal.dtu at gmail.com> wrote: > > >> and the details of which weighting schemes were available in which version > >> isn't a key part of the $set command itself. > > > > Do you suggest dropping that piece of information out? Since the reason behind

Omega: Missing support for newer weighting schemes

2017 Apr 09

Omega: Missing support for newer weighting schemes

On Sun, Apr 09, 2017 at 11:34:07PM +0530, Vivek Pal wrote: > > Each scheme already has a human-readable name, and Xapian::Registry > > can map that to an "examplar" object of the right type, so we > > could take a string like "bm25 1 0.8", see the first word is "bm25" > > and get a BM25Weight object, then call parse_params("1 0.8") on

[GSOC 2014] Indexing INEX dataset

2014 Mar 22

[GSOC 2014] Indexing INEX dataset

For unsupervised approaches like BM25 this approach works well but letor does not need special weighting for title in this form as it itself assigns weights to title features separately. But I see your concern it would be a problem when BM25 is used on the index with this setup. Hence its preferable to take a note of this uplift in title weight for xapian-letor and normalize it everywhere

Weighting the author of a doc when that term can also appear as a frequent term in other docs

2017 Sep 28

Weighting the author of a doc when that term can also appear as a frequent term in other docs

We have a corpus of academic papers. Sometimes it happens that there is an academic controversy and one paper is a response or rebuttal to another paper. The name of the author of the first paper may appear many times in the second paper. So in light of this, how should we set our weight on the author field? Here is an example: http://www.nber.org/papers/w11215 in which the term

Weighting Schemes: Evaluation results

2016 Jul 24

Weighting Schemes: Evaluation results

Hi all, I have evaluated new weighting schemes along with their existing counterparts in Xapian to compare and see which one does better job. Also, I have put together all the results files for easy access here: https://github.com/ivmarkp/xapian-evaluation/tree/evaluation/run and a README for getting started with xapian-evaluation module. Hopefully, it might be of help to those who are new to

Project: Posting list encoding improvements

2012 Mar 31

Project: Posting list encoding improvements

Hi Xapianers: My name is Weixian Zhou, Computer Science student of University at Buffalo, State University of New York. I am interested in the project of posting list encoding improvements and weighting schemes. I have some questions toward them. 1) After read the comments in brass_postlist.cc, I am still not very clear about the detailed structure of postings list. If you can provide some simple

Is it possible to reset the parameters in BM25 each time a new query enters?

2011 Feb 18

Is it possible to reset the parameters in BM25 each time a new query enters?

Hi guys, I'm trying to improve the search results of our collection by tuning the parameters in the BM25 weighting schema. Since our collection includes several databases, such as for pictures, websites, etc., I would like to use different values of the same schema to calculate the weights. Yet, rebuilding each time after the change was done to the head file seems not an optimal approach and

Weighting Schemes -- Project Progress

2016 Jun 10

Weighting Schemes -- Project Progress

Hello everyone, I have been working on adding support for BM25+ weighting function from the last couple of weeks. Initially, I considered modifying bm25weight.cc to add support for BM25+ function without disturbing functionalities of BM25. But that didn't work out very well. A day or two was spent trying to refactor and debug the same code. Later, I took another approach following the

Backend for Lucene format indexes-How to get doclength

2013 Jun 17

Backend for Lucene format indexes-How to get doclength

*Or do you mean that it's one number per document whereas the other stats are per database, so it's harder to store it?* yes, I mean this. It's a huge data. If a new doclength list(contains all the doclength in a list, like chert) is added by myself, I am concern about: 1. This doclength list may be the bottlenect in this backend, http://trac.xapian.org/ticket/326 2. Change too much

Proposal Outline

2014 Mar 11

Proposal Outline

Hi, Before starting my proposal, I wanted to know what is the expected output of Letor module. Is it for transfer learning (i.e you learn from one dataset and leverage it to predict the rankings of other dataset) or is it for supervised learning? For instance - Xapian currently powers the Gmane search which is by default based on BM25 weighting scheme and now suppose we want to use LETOR to rank

similar to: Lucene ranking