Displaying 20 results from an estimated 1000 matches similar to: "GSoc Project Idea Weighting Schemes (Ranking)"
2017 Apr 08
2
Omega: Missing support for newer weighting schemes
On Sat, Apr 08, 2017 at 09:11:22PM +0100, James Aylett wrote:
> On 8 Apr 2017, at 19:15, Vivek Pal <vivekpal.dtu at gmail.com> wrote:
>
> >> and the details of which weighting schemes were available in which version
> >> isn't a key part of the $set command itself.
> >
> > Do you suggest dropping that piece of information out? Since the reason behind
2017 Apr 09
3
Omega: Missing support for newer weighting schemes
On Sun, Apr 09, 2017 at 11:34:07PM +0530, Vivek Pal wrote:
> > Each scheme already has a human-readable name, and Xapian::Registry
> > can map that to an "examplar" object of the right type, so we
> > could take a string like "bm25 1 0.8", see the first word is "bm25"
> > and get a BM25Weight object, then call parse_params("1 0.8") on
2016 Jul 24
2
Weighting Schemes: Evaluation results
Hi all,
I have evaluated new weighting schemes along with their existing
counterparts in Xapian to compare and see which one does better job.
Also, I have put together all the results files for easy access here:
https://github.com/ivmarkp/xapian-evaluation/tree/evaluation/run
and a README for getting started with xapian-evaluation module. Hopefully,
it might be of help to those who are new to
2016 Jun 10
2
Weighting Schemes -- Project Progress
Hello everyone,
I have been working on adding support for BM25+ weighting function from the
last couple of weeks. Initially, I considered modifying bm25weight.cc to
add support for BM25+ function without disturbing functionalities of BM25.
But that didn't work out very well. A day or two was spent trying to
refactor and debug the same code.
Later, I took another approach following the
2017 Sep 28
1
Weighting the author of a doc when that term can also appear as a frequent term in other docs
We have a corpus of academic papers. Sometimes it happens that there is
an academic controversy and one paper is a response or rebuttal to
another paper. The name of the author of the first paper may appear many
times in the second paper. So in light of this, how should we set our
weight on the author field?
Here is an example:
http://www.nber.org/papers/w11215
in which the term
2011 Apr 01
2
New Idea on Ranking in IR
Hello,
I want to discuss my idea on ranking in IR system which I think can be good
extension to Xapian. If I am not too late to discuss it then please consider
it. I first give you brief background of me, I am a Masters student working
on my thesis in the Information Retrieval. I today only got a mail from one
of the professor from Europe whom i am going to join for Ph.D about GSoC and
more
2016 Jul 25
3
Weighting Schemes: Evaluation results
Hi James,
> We probably don't want them committed in git where they're evaluation
> runs (because we can recreate them); a gist might be more appropriate.
Sorry, I have moved results files over to gist for each individual
weighting scheme.
Link: https://gist.github.com/ivmarkp/secret
> I can't tell, but are some of those files from FIRE? If so, they
> shouldn't be
2017 Apr 08
2
Omega: Missing support for newer weighting schemes
> It may be worth splitting that part of the $set documentation out into its
> own section somehow, because it's getting a bit long -
Undoubtedly; $set command has the longest section on the documentation page :)
But it would be hard splitting that up because the documentation is organised
in a way that each command is really contained in its own specific section.
> and the details
2017 Apr 12
4
Omega: Missing support for newer weighting schemes
> Each scheme already has a human-readable name, and Xapian::Registry
> can map that to an "examplar" object of the right type, so we
> could take a string like "bm25 1 0.8", see the first word is "bm25"
> and get a BM25Weight object, then call parse_params("1 0.8") on it to
> create the correct Weight object (broadly similar to how
2011 Feb 18
1
Is it possible to reset the parameters in BM25 each time a new query enters?
Hi guys,
I'm trying to improve the search results of our collection by tuning the parameters in the BM25 weighting schema. Since our collection includes several databases, such as for pictures, websites, etc., I would like to use different values of the same schema to calculate the weights. Yet, rebuilding each time after the change was done to the head file seems not an optimal approach and
2012 Apr 15
1
Patch for Initial Prototype implementation of Unigram Langauage Modelling in xapian-core.
Hi,
I have implemented initial prototype of the Xapian::Weight subclass for
Unigram Language Modelling to support UnigramLM weighing in xapian.Other
changes include adding collection_frequency to TermFreqs struct to store
collection frequency of terms and some changes to support it xapian
Framework,Changing simplesearch.cc to search using UnigramLMWeight class.
Following issues have not being
2014 Mar 22
2
[GSOC 2014] Indexing INEX dataset
For unsupervised approaches like BM25 this approach works well but letor
does not need special weighting for title in this form as it itself assigns
weights to title features separately.
But I see your concern it would be a problem when BM25 is used on the index
with this setup. Hence its preferable to take a note of this uplift in
title weight for xapian-letor and normalize it everywhere
2017 Apr 13
2
Omega: Missing support for newer weighting schemes
On Mon, Apr 10, 2017 at 11:47:36PM +0530, Vivek Pal wrote:
> > No, use Xapian::Registry to find the weighting scheme from the name
> > like how Weight::unserialise() does (otherwise every caller would need
> > code similar to that above).
>
> Okay, I looked into Xapian::Registry and it seems you are referring to using
> the get_weighting_scheme method? (which expects a
2012 Apr 02
0
GSoC, Xapian Project Weighting Schemes
Hello all,
I am very sorry I did not include xapian-devel mailing list in my previous mail.
Thanks for responding my mail.
Mohd Azeem
NIT UK
________________________________
From: Olly Betts <olly at survex.com>
To: Mohd Azeem <azeem201001 at yahoo.in>
Cc: Parth Gupta <parthg.88 at gmail.com>
Sent: Saturday, 31 March 2012 11:40 AM
Subject: Re: GSoC, Xapian Project Weighting
2011 Jun 01
1
Relevance, weighting and searching by specifically weighted text
Hi guys
In our implementation of Xapian for one of our sites, we index the
title, subtitle, summary and table of contents of around 200,000
products on ReportBuyer.com. When we create each Xapian doc to index
this information, we apply a weighting to each of these 'fields' and add
these to the doc using index_text with the second parameter passing in a
weighting.
I've been asked if
2013 Aug 26
2
Backend for Lucene format indexes-How to get doclength
On Mon, Aug 26, 2013 at 09:41:07AM +0800, jiangwen jiang wrote:
> > For now, using weighting schemes which don't use document length is
> > probably the simplest answer.
>
> There's tf-idf weighting scheme on svn master, is it suitable for lucene
> backend?
Yes - TfIdfWeight doesn't ever use the document length (at least with
the normalisations currently
2016 Mar 10
2
Introduction and Doubts
Tf-idf is most used used weighting scheme is easy to understand and has
been used in other frameworks like lucene and many other places.
okapi bm25(implemented in xapian) is theoretically better/improved measure
than tf-idf and
i am looking into various other weighting scheme which are there in xapian
or can be implemented like TF-ICF(term frequecy inverse corpus
frequency),TF-RF(term
2018 Jan 22
2
How to get the serialise score returned in Xapian::KeyMaker->operator().
>A possible workaround (and perhaps a better approach) would be to
>set BoolWeight as the weighting scheme, then feed in your score as
>a weight using a PostingSource. Then it's available via get_weight()
>on the MSetIterator object:
>
>https://getting-started-with-xapian.readthedocs.io/en/latest/advanced/postingsource.html
>
>You may find that's faster because
2014 Mar 04
2
Test Dataset for performance and accuracy analysis
Hi Parth,
I implemented DFR algorithms in Xapian as
a part of GSOC last year under the mentorship of Olly. This year, I want to
work on analyzing and optimizing the performance of the DFR algorithms and
comparing them with BM25.I also want to work on profiling the query
expansion schemes and test the relevance(precision and recall) / speed(time
taken) of the
2012 Mar 31
1
Project: Posting list encoding improvements
Hi Xapianers:
My name is Weixian Zhou, Computer Science student of University at Buffalo,
State University of New York. I am interested in the project of posting
list encoding improvements and weighting schemes. I have some questions
toward them.
1) After read the comments in brass_postlist.cc, I am still not very clear
about the detailed structure of postings list. If you can provide some
simple