search for: tfidfweight

Displaying 10 results from an estimated 10 matches for "tfidfweight".

2016 Jul 27
2
Weighting Schemes: Implementing Piv+ Normalization
Hi, I have added support for Piv normalization in Tf-Idf weighting scheme as a intermediate step to implementing the support for Piv+ normalization. All tests pass. But I'm running into some issues with Piv+ normalization. In the Piv+ formula , there are two parameters (s and delta) that control the weight assigned. I think the way I'm serialising and unserialising these parameters has
2013 Aug 26
2
Backend for Lucene format indexes-How to get doclength
On Mon, Aug 26, 2013 at 09:41:07AM +0800, jiangwen jiang wrote: > > For now, using weighting schemes which don't use document length is > > probably the simplest answer. > > There's tf-idf weighting scheme on svn master, is it suitable for lucene > backend? Yes - TfIdfWeight doesn't ever use the document length (at least with the normalisations currently implemented). You could also use BM25 with parameter b=0. Cheers, Olly
2013 Sep 02
2
Backend for Lucene format indexes-How to get doclength
On Mon, Sep 02, 2013 at 09:21:48AM +0800, jiangwen jiang wrote: > TfIdfWeight and BM25(b=0) also need wdf_upper_bound, it is not exists in > Lucene backends. If you don't provide an implementation of wdf_upper_bound(), the default is to use the collection frequency of the term, so provided that information is available in the lucene files, the lack of wdf_upper_bound...
2016 Jul 28
2
Weighting Schemes: Implementing Piv+ Normalization
...ew double parameters (s and delta) but it isn't turning out to be smooth because there's no method for unserialising strings in serialise-double.h Although, doing just const string normals = ptr++; or, const string normals = static_cast<const string>ptr++; fixes compile errors. But tfidfweight3 test case is failing with remote backends :- $ ./runtest gdb ./apitest -v tfidfweight3 Running test: tfidfweight3... SerialisationError: REMOTE:Bad encoded double: short mantissa (context: remote:prog(../bin/xapian-progsrv -t300000 .glass/db=apitest_simpledata) I'm wondering if I need to in...
2013 Jun 17
2
Backend for Lucene format indexes-How to get doclength
...st may be the bottlenect in this backend, http://trac.xapian.org/ticket/326 2. Change too much above Lucene file format, then it's hard to compare performance between Xapian and Lucene Some ideas: 1. Using rank algorithm without doclength, such as BM25Weight or TradWeight without doclength, or tfidfWeight. If ranking results will be not good without doclength? 2. Stores doclength in .prx payload when doing Lucene indexing. https://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/index/Payload.html http://searchhub.org/2009/08/05/getting-started-with-payloads/ But this method has o...
2013 Mar 26
1
Merging of the TfIdf patch
Hello Guys. I have updated the code,tests,documentation,makefile entries and the registry entry of the* *TfIdf patch as per the feedback.Please do let me know if any additional changes are required before the patch can be merged, -Regards -Aarsh On Sun, Mar 3, 2013 at 2:50 PM, aarsh shah <aarshkshah1992 at gmail.com> wrote: > Hello guys.I have sent a pull request for the code and
2013 Jun 16
3
Backend for Lucene format indexes-How to get doclength
Hi, all: I have wrote a demo patch for Backend for Lucene format indexes, Lucene version is 3.6.2. http://lucene.apache.org/core/3_6_2/fileformats.html Now, this demo patch just support the basic features in Lucene. Compound File(.cfs/.cfe)?term vector(.tvx/.tvd/.tvf) delete document(.del) are not supported, skip list in .fdx is not supported too example/quest.cc is used to test this demo.
2014 Feb 11
2
Next Steps.
Hey guys, I had introduced myself earlier on IRC. I talked to Parth and had a brief chat with Olly, but just to re-iterate I'm Tejas Nikumbh, and I'm interested in contributing to Xapian for GSoC this year. I'm specifically interested in letor and weighing schemes projects. I've been able to build xapian on my machine without any errors via the Guidelines on the Hacking
2013 Aug 25
2
Backend for Lucene format indexes-How to get doclength
On Tue, Aug 20, 2013 at 07:28:42PM +0800, jiangwen jiang wrote: > I think norm(t, d) in Lucene can used to caculate the number which is > similar to doc length(see norm(t,d) in > http://lucene.apache.org/core/3_5_0/api/all/org/apache/lucene/search/Similarity.html#formula_norm). It sounds similar (especially if document and field boosts aren't in use), though some places may rely on
2017 Mar 15
2
xapian core missing link to math on MSYS2
...weight/.libs/bm25weight.o weight/.libs/boolweight.o weight/.libs/coordweight.o weight/.libs/dlhweight.o weight/.libs/dphweight.o weight/.libs/ifb2weight.o weight/.libs/ineb2weight.o weight/.libs/inl2weight.o weight/.libs/lmweight.o weight/.libs/pl2plusweight.o weight/.libs/pl2weight.o weight/.libs/tfidfweight.o weight/.libs/tradweight.o weight/.libs/weight.o weight/.libs/weightinternal.o -lrpcrt4 -lz -lws2_32 -LD:/bda-ci/msys2/unstable/mingw64/lib/gcc/x86_64-w64-mingw32/6.3.0 -LD:/bda-ci/msys2/unstable/mingw64/lib/gcc/x86_64-w64-mingw32/6.3.0/../../../../x86_64-w64-mingw32/lib/../lib -LD:/bda-ci/msys2...