Displaying 10 results from an estimated 10 matches for "tfidfweight".
2016 Jul 27
2
Weighting Schemes: Implementing Piv+ Normalization
Hi,
I have added support for Piv normalization in Tf-Idf weighting scheme as a
intermediate step to implementing the support for Piv+ normalization. All
tests pass.
But I'm running into some issues with Piv+ normalization. In the Piv+
formula , there are two parameters (s and delta) that control the weight
assigned. I think the way I'm serialising and unserialising these
parameters has
2013 Aug 26
2
Backend for Lucene format indexes-How to get doclength
On Mon, Aug 26, 2013 at 09:41:07AM +0800, jiangwen jiang wrote:
> > For now, using weighting schemes which don't use document length is
> > probably the simplest answer.
>
> There's tf-idf weighting scheme on svn master, is it suitable for lucene
> backend?
Yes - TfIdfWeight doesn't ever use the document length (at least with
the normalisations currently implemented).
You could also use BM25 with parameter b=0.
Cheers,
Olly
2013 Sep 02
2
Backend for Lucene format indexes-How to get doclength
On Mon, Sep 02, 2013 at 09:21:48AM +0800, jiangwen jiang wrote:
> TfIdfWeight and BM25(b=0) also need wdf_upper_bound, it is not exists in
> Lucene backends.
If you don't provide an implementation of wdf_upper_bound(), the default
is to use the collection frequency of the term, so provided that
information is available in the lucene files, the lack of
wdf_upper_bound...
2016 Jul 28
2
Weighting Schemes: Implementing Piv+ Normalization
...ew double parameters (s and delta) but
it isn't turning out to be smooth because there's no method for
unserialising strings in serialise-double.h
Although, doing just
const string normals = ptr++; or, const string normals = static_cast<const
string>ptr++; fixes compile errors.
But tfidfweight3 test case is failing with remote backends :-
$ ./runtest gdb ./apitest -v tfidfweight3
Running test: tfidfweight3... SerialisationError: REMOTE:Bad encoded
double: short mantissa (context: remote:prog(../bin/xapian-progsrv -t300000
.glass/db=apitest_simpledata)
I'm wondering if I need to in...
2013 Jun 17
2
Backend for Lucene format indexes-How to get doclength
...st may be the bottlenect in this backend,
http://trac.xapian.org/ticket/326
2. Change too much above Lucene file format, then it's hard to compare
performance between Xapian and Lucene
Some ideas:
1. Using rank algorithm without doclength, such as BM25Weight or TradWeight
without doclength, or tfidfWeight.
If ranking results will be not good without doclength?
2. Stores doclength in .prx payload when doing Lucene indexing.
https://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/index/Payload.html
http://searchhub.org/2009/08/05/getting-started-with-payloads/
But this method has o...
2013 Mar 26
1
Merging of the TfIdf patch
Hello Guys. I have updated the code,tests,documentation,makefile entries
and the registry entry of the* *TfIdf patch as per the feedback.Please do
let me know if any additional changes are required before the patch can be
merged,
-Regards
-Aarsh
On Sun, Mar 3, 2013 at 2:50 PM, aarsh shah <aarshkshah1992 at gmail.com> wrote:
> Hello guys.I have sent a pull request for the code and
2013 Jun 16
3
Backend for Lucene format indexes-How to get doclength
Hi, all:
I have wrote a demo patch for Backend for Lucene format indexes, Lucene
version is 3.6.2.
http://lucene.apache.org/core/3_6_2/fileformats.html
Now, this demo patch just support the basic features in Lucene. Compound
File(.cfs/.cfe)?term vector(.tvx/.tvd/.tvf)
delete document(.del) are not supported, skip list in .fdx is not supported
too
example/quest.cc is used to test this demo.
2014 Feb 11
2
Next Steps.
Hey guys,
I had introduced myself earlier on IRC. I talked to Parth and had a
brief chat with Olly, but just to re-iterate I'm Tejas Nikumbh, and I'm
interested in contributing to Xapian for GSoC this year. I'm specifically
interested in letor and weighing schemes projects.
I've been able to build xapian on my machine without any errors via the
Guidelines on the Hacking
2013 Aug 25
2
Backend for Lucene format indexes-How to get doclength
On Tue, Aug 20, 2013 at 07:28:42PM +0800, jiangwen jiang wrote:
> I think norm(t, d) in Lucene can used to caculate the number which is
> similar to doc length(see norm(t,d) in
> http://lucene.apache.org/core/3_5_0/api/all/org/apache/lucene/search/Similarity.html#formula_norm).
It sounds similar (especially if document and field boosts aren't in use),
though some places may rely on
2017 Mar 15
2
xapian core missing link to math on MSYS2
...weight/.libs/bm25weight.o weight/.libs/boolweight.o weight/.libs/coordweight.o weight/.libs/dlhweight.o weight/.libs/dphweight.o weight/.libs/ifb2weight.o weight/.libs/ineb2weight.o weight/.libs/inl2weight.o weight/.libs/lmweight.o weight/.libs/pl2plusweight.o weight/.libs/pl2weight.o weight/.libs/tfidfweight.o weight/.libs/tradweight.o weight/.libs/weight.o weight/.libs/weightinternal.o -lrpcrt4 -lz -lws2_32 -LD:/bda-ci/msys2/unstable/mingw64/lib/gcc/x86_64-w64-mingw32/6.3.0 -LD:/bda-ci/msys2/unstable/mingw64/lib/gcc/x86_64-w64-mingw32/6.3.0/../../../../x86_64-w64-mingw32/lib/../lib -LD:/bda-ci/msys2...