similar to: Weighting the author of a doc when that term can also appear as a frequent term in other docs

Displaying 20 results from an estimated 3000 matches similar to: "Weighting the author of a doc when that term can also appear as a frequent term in other docs"

2017 Apr 08
2
Omega: Missing support for newer weighting schemes
On Sat, Apr 08, 2017 at 09:11:22PM +0100, James Aylett wrote: > On 8 Apr 2017, at 19:15, Vivek Pal <vivekpal.dtu at gmail.com> wrote: > > >> and the details of which weighting schemes were available in which version > >> isn't a key part of the $set command itself. > > > > Do you suggest dropping that piece of information out? Since the reason behind
2017 Apr 09
3
Omega: Missing support for newer weighting schemes
On Sun, Apr 09, 2017 at 11:34:07PM +0530, Vivek Pal wrote: > > Each scheme already has a human-readable name, and Xapian::Registry > > can map that to an "examplar" object of the right type, so we > > could take a string like "bm25 1 0.8", see the first word is "bm25" > > and get a BM25Weight object, then call parse_params("1 0.8") on
2014 Nov 23
2
GSoc Project Idea Weighting Schemes (Ranking)
Hi, I am Abhishek Currently Xapian::Weight follows BM25 scheme, many models such as the Divergence from Randomness (DfR) family of models, Unigram Language Model and the Bi-gram Language Model implemented two years ago in GSoc 2012 yet not merged to the master. The new weighing schemes or improvement in implementing the previous models to change the default scheme of BM25 from SMART with
2016 Jul 24
2
Weighting Schemes: Evaluation results
Hi all, I have evaluated new weighting schemes along with their existing counterparts in Xapian to compare and see which one does better job. Also, I have put together all the results files for easy access here: https://github.com/ivmarkp/xapian-evaluation/tree/evaluation/run and a README for getting started with xapian-evaluation module. Hopefully, it might be of help to those who are new to
2012 Jul 17
1
Can not use custom weight scheme with python binding
Hi, I'm trying to use custom weight with python binding. My test code is like this. class TinkerWeight(xapian.Weight): def __init__(self): pass def name(self): return "Tinker" def serialize(self): return "" def get_sumpart(*args): return 1 def get_maxpart(*args): return 1 def get_sumextra(*args):
2016 Jun 10
2
Weighting Schemes -- Project Progress
Hello everyone, I have been working on adding support for BM25+ weighting function from the last couple of weeks. Initially, I considered modifying bm25weight.cc to add support for BM25+ function without disturbing functionalities of BM25. But that didn't work out very well. A day or two was spent trying to refactor and debug the same code. Later, I took another approach following the
2016 May 16
2
Weighting recent results
I was thinking about this some more: Is there a reason I can't just weight by some function of recency at indexing time? $weight = get_weight_based_on_recency(...); $tg->index_text($txt,$weight); If I wanted to allow the user the option of searching either in recency-weighted mode or not, I could index each document into 2 different databases, one with and one without. This avoids
2011 Feb 18
1
Is it possible to reset the parameters in BM25 each time a new query enters?
Hi guys, I'm trying to improve the search results of our collection by tuning the parameters in the BM25 weighting schema. Since our collection includes several databases, such as for pictures, websites, etc., I would like to use different values of the same schema to calculate the weights. Yet, rebuilding each time after the change was done to the head file seems not an optimal approach and
2014 Mar 22
2
[GSOC 2014] Indexing INEX dataset
For unsupervised approaches like BM25 this approach works well but letor does not need special weighting for title in this form as it itself assigns weights to title features separately. But I see your concern it would be a problem when BM25 is used on the index with this setup. Hence its preferable to take a note of this uplift in title weight for xapian-letor and normalize it everywhere
2013 Jun 16
3
Backend for Lucene format indexes-How to get doclength
Hi, all: I have wrote a demo patch for Backend for Lucene format indexes, Lucene version is 3.6.2. http://lucene.apache.org/core/3_6_2/fileformats.html Now, this demo patch just support the basic features in Lucene. Compound File(.cfs/.cfe)?term vector(.tvx/.tvd/.tvf) delete document(.del) are not supported, skip list in .fdx is not supported too example/quest.cc is used to test this demo.
2011 Jun 01
1
Relevance, weighting and searching by specifically weighted text
Hi guys In our implementation of Xapian for one of our sites, we index the title, subtitle, summary and table of contents of around 200,000 products on ReportBuyer.com. When we create each Xapian doc to index this information, we apply a weighting to each of these 'fields' and add these to the doc using index_text with the second parameter passing in a weighting. I've been asked if
2016 Mar 10
2
Introduction and Doubts
Tf-idf is most used used weighting scheme is easy to understand and has been used in other frameworks like lucene and many other places. okapi bm25(implemented in xapian) is theoretically better/improved measure than tf-idf and i am looking into various other weighting scheme which are there in xapian or can be implemented like TF-ICF(term frequecy inverse corpus frequency),TF-RF(term
2013 Aug 26
2
Backend for Lucene format indexes-How to get doclength
On Mon, Aug 26, 2013 at 09:41:07AM +0800, jiangwen jiang wrote: > > For now, using weighting schemes which don't use document length is > > probably the simplest answer. > > There's tf-idf weighting scheme on svn master, is it suitable for lucene > backend? Yes - TfIdfWeight doesn't ever use the document length (at least with the normalisations currently
2005 Nov 16
1
query time stemming and term weights
I am developping a personal/desktop search tool for which I am experimenting with doing no stemming during the indexing, but instead having a stem database (or several for different languages), used for expanding the query terms at search time. (ie: user query: flooring -> stem: floor -> final query for: [floored flooring floorings floors]) I have thought of a possible problem with
2012 Mar 31
1
Project: Posting list encoding improvements
Hi Xapianers: My name is Weixian Zhou, Computer Science student of University at Buffalo, State University of New York. I am interested in the project of posting list encoding improvements and weighting schemes. I have some questions toward them. 1) After read the comments in brass_postlist.cc, I am still not very clear about the detailed structure of postings list. If you can provide some simple
2016 Jul 25
3
Weighting Schemes: Evaluation results
Hi James, > We probably don't want them committed in git where they're evaluation > runs (because we can recreate them); a gist might be more appropriate. Sorry, I have moved results files over to gist for each individual weighting scheme. Link: https://gist.github.com/ivmarkp/secret > I can't tell, but are some of those files from FIRE? If so, they > shouldn't be
2010 Sep 26
5
Network booting FreeBSD with gpxelinx almost works (fwd)
We have been network booting FreeBSD for some time with pxeboot. But now we would like to have menu of OSs to boot and got the idea somewhere that gpxelinux could do that for us. We copied gpxelinux.0 from the syslinux-4.02 distribution and replaced pxeboot with "gpxelinux" in the dhcpd.conf file. Indeed with a configuration file in pxelinux.cfg like this: default freebsd
2019 Mar 19
3
Project Proposal in GSoC 2019
Hi All, I am interested in applying for the two projects listed in the Xapian Gsoc 2019 project idealist: "Learning to Rank Stabilisation" and "Weighting Schemes". I have downloaded the codebase and going through some of the commits related to Letor API, BM25, and DFR weighting schemes. Can anyone tell me how to write about the formal proposal for the above-mentioned projects?
2018 Jan 22
2
How to get the serialise score returned in Xapian::KeyMaker->operator().
>A possible workaround (and perhaps a better approach) would be to >set BoolWeight as the weighting scheme, then feed in your score as >a weight using a PostingSource. Then it's available via get_weight() >on the MSetIterator object: > >https://getting-started-with-xapian.readthedocs.io/en/latest/advanced/postingsource.html > >You may find that's faster because
2014 Mar 04
2
Test Dataset for performance and accuracy analysis
Hi Parth, I implemented DFR algorithms in Xapian as a part of GSOC last year under the mentorship of Olly. This year, I want to work on analyzing and optimizing the performance of the DFR algorithms and comparing them with BM25.I also want to work on profiling the query expansion schemes and test the relevance(precision and recall) / speed(time taken) of the