similar to: Implementing tf-idf weighting scheme in Xapian

Displaying 20 results from an estimated 700 matches similar to: "Implementing tf-idf weighting scheme in Xapian"

2013 Mar 11
1
Implementation of the PL2 weighting scheme of the DFR Framework
Hello guys.I am working on implementing the PL2 weighting scheme of the DFR framework by Gianni Amati. It uses the Poisson approximation of the Binomial as the probabilistic model (P), the Laplace law of succession to calculate the after effect of sampling or the risk gain (L) and within document frequency normalization H2(2) (as proposed by Amati in his PHD thesis). The formula for w(t,d) in
2017 May 22
2
Xapian 1.4.3 "Db block overwritten - are there multiple writers?"
Olly Betts writes: > On Wed, May 17, 2017 at 09:08:32PM +0200, Jean-Francois Dockes wrote: > > I have a user reporting the following error during recoll indexing: > > > > flush() failed: Db block overwritten - are there multiple writers? > > > > "flush() failed" is from recoll, the rest is, I think the text of the Xapian > > exception.
2013 Mar 03
0
Added code and tests for the tf-idf weighting scheme.
Hello guys.I have sent a pull request for the code and tests of the Tf-Idf weighting scheme. Please do let me know if any changes are required.Meanwhile,Ill begin working on implementing normalizations which require additional statistics and on the DFR schemes. https://github.com/xapian/xapian/pull/6 On Tue, Feb 26, 2013 at 5:30 PM, <xapian-devel-request at lists.xapian.org>wrote: >
2013 Feb 25
0
Sent a pull request for the Tf-Idf Weighting scheme
Hello guys :) I have sent a pull request for the Tf-Idf Weighting scheme incorporating as many normalizations as I could with the help of statistics currently available from Xapian::Weight . Please let me know what you'll think about it. I used the weighting scheme in a simple searcher and it did a fine job with it. I have no experience with writing tests for features like this.Please give me
2010 Jan 18
3
postlist: Tag containing meta information is corrupt.
Greetings, Using latest svn. I've noticed the following error when performing index merging: postlist: baseB blocksize=8K items=33962 lastblock=534 revision=1 levels=2 root=459 B-tree checked okay Tag containing meta information is corrupt. postlist table errors found: 1 I can still search on this index (I've only checked very small indexes), but merging is now a problem since I check
2017 May 17
2
Xapian 1.4.3 "Db block overwritten - are there multiple writers?"
Hi, I have a user reporting the following error during recoll indexing: flush() failed: Db block overwritten - are there multiple writers? "flush() failed" is from recoll, the rest is, I think the text of the Xapian exception. This is with Xapian 1.4.3 on Linux (I asked for more details, should be coming). I don't think that I've ever seen this error, and I also
2012 Apr 20
1
Implementing the tf-idf weighting scheme
Hi, all: This is the basic implementation of tf-idf scheme (basic scheme used in SMART) that can be used in the Xapian. It might still need some futher revision, but I believe it works anyway.:) I modified the weight.h to define a subclass Tf_idfWeight and add a new file tf_idf.cc in ../weight in the repo, to implement Tf_idfWeight. Here is the git diff patch: https://gist.github.com/2422049
2017 Mar 05
3
GSoc 2017 Introduction(Weighting Schemes)
Hello Everyone, I am a second year graduate student at IIIT-Bangalore and my interest is in the field of Information Retrieval. I have successfully compiled Xapian from source and have implemented some examples. While going through the project list Weighting Schemes project is the one I was looking to contribute to. So i went through the xapian-core/weight where most of the schemes are already
2013 Aug 25
2
Backend for Lucene format indexes-How to get doclength
On Tue, Aug 20, 2013 at 07:28:42PM +0800, jiangwen jiang wrote: > I think norm(t, d) in Lucene can used to caculate the number which is > similar to doc length(see norm(t,d) in > http://lucene.apache.org/core/3_5_0/api/all/org/apache/lucene/search/Similarity.html#formula_norm). It sounds similar (especially if document and field boosts aren't in use), though some places may rely on
2012 Jul 17
1
Can not use custom weight scheme with python binding
Hi, I'm trying to use custom weight with python binding. My test code is like this. class TinkerWeight(xapian.Weight): def __init__(self): pass def name(self): return "Tinker" def serialize(self): return "" def get_sumpart(*args): return 1 def get_maxpart(*args): return 1 def get_sumextra(*args):
2017 May 24
0
Xapian 1.4.3 "Db block overwritten - are there multiple writers?"
On Mon, May 22, 2017 at 07:45:59AM +0200, Jean-Francois Dockes wrote: > Olly Betts writes: > > Assuming nobody deleted the log file, this could be a Xapian bug. This I meant "lock file" not "log file" here. > > isn't something we're drowning in reports of, so presumably it doesn't > > trigger easily, so finding a way to reproduce would be
2016 Aug 07
2
Weighting Schemes: Evaluation results
Hi, Evaluation of pivoted normalization ("PPP") of tf-idf weighting scheme is also complete now. I have also evaluated the default tf-idf normalization ("ntn") and other normalizations combinations involving pivoted normalization in wdfn, idfn and wtn component as "Pxx", "xPx" and "xxP" normalization strings respectively to have a clear idea about
2023 May 03
1
manual flushing thresholds for deletes?
Olly Betts <olly at survex.com> wrote: > On Mon, Mar 27, 2023 at 11:22:09AM +0000, Eric Wong wrote: > > Olly Betts <olly at survex.com> wrote: > > > 10 seems too long. You want the mean word length weighted by frequency > > > of occurrence. For English that's typically around 5 characters, which > > > is 5 bytes. If we go for +1 that's:
2005 May 25
1
[Fwd: Re: [Fwd: failure delivery]]
I appear to have hit one of the "drop" issues raised in some discussions a couple of years ago by Frank Harrell. They don't seem to have been fixed, and I'm under some pressure to get a quick solution for a forecasting task I'm doing. I have been modelling some retail sales data, and the days just after Thanksgiving (US version!) are important. So I created some dummy
2023 Mar 27
1
manual flushing thresholds for deletes?
On Mon, Mar 27, 2023 at 11:22:09AM +0000, Eric Wong wrote: > Olly Betts <olly at survex.com> wrote: > > 10 seems too long. You want the mean word length weighted by frequency > > of occurrence. For English that's typically around 5 characters, which > > is 5 bytes. If we go for +1 that's: > > Actually, 10 may be too short in my case since there's a
2008 May 14
4
GPL PV drivers for Windows - WDM version
I''m been busily converting the xenpci and xenvbd drivers from WDF to WDM to resolve a few issues including potential licensing problems with the Microsoft WDF and to (hopefully) allow them to function as boot drivers when doing install and system recovery. It was a fairly major rewrite of xenpci, and xenvbd, which are now working (booting and running without crashes so far). I
2008 May 14
4
GPL PV drivers for Windows - WDM version
I''m been busily converting the xenpci and xenvbd drivers from WDF to WDM to resolve a few issues including potential licensing problems with the Microsoft WDF and to (hopefully) allow them to function as boot drivers when doing install and system recovery. It was a fairly major rewrite of xenpci, and xenvbd, which are now working (booting and running without crashes so far). I
2023 May 03
1
manual flushing thresholds for deletes?
On Wed, May 03, 2023 at 12:38:15PM +0000, Eric Wong wrote: > Olly Betts <olly at survex.com> wrote: > > This will also effectively ignore boolean terms, assuming you're giving > > them wdf of 0 (because $3 here is the collection frequency, which is > > sum(wdf(term)) over all documents). > > Should boolean terms be ignored when estimating flushing >
2016 Jul 28
2
Weighting Schemes: Evaluation results
Ah. If FIRE doesn't have something that can show this suitably, then > maybe Parth can advise on access to TREC, as I know he's used some of > them in the past. > ?I can say FIRE is also a reliable source but INEX/TREC are better. INEX can give you free access and TREC is not freely available. I had used INEX for xapian in the past and some details are here:
2013 Mar 26
1
Merging of the TfIdf patch
Hello Guys. I have updated the code,tests,documentation,makefile entries and the registry entry of the* *TfIdf patch as per the feedback.Please do let me know if any additional changes are required before the patch can be merged, -Regards -Aarsh On Sun, Mar 3, 2013 at 2:50 PM, aarsh shah <aarshkshah1992 at gmail.com> wrote: > Hello guys.I have sent a pull request for the code and