Displaying 9 results from an estimated 9 matches for "get_maxpart".
2013 Feb 19
2
Implementing tf-idf weighting scheme in Xapian
...is the weight given by the
term to the document.
The basic formula is W(t,d)=wdf* log(N/termfreq) .
However,various normalizations can be applied to both wdf and idf.
The extra per document component will be 0 here and so get_maxextra( ) will
return 0 .
Moreover,an upper bound on W(t,d) for get_maxpart( ) can be found out
easily for a particular normalization (if I have all the required metrics
available).
For eg:- If I am using logarithmic normalization for the wdf (within
document frequency) ,then an upper bound on W(t,d) will be
(log(wdf_upperbound_)+1)*log(N/termfreq) as N(collection size)...
2013 Mar 11
1
Implementation of the PL2 weighting scheme of the DFR Framework
...oisson distrubution = Collection frequency of
the term / Size of the database
and the base of all logarithms is 2.
c is a constant parameter
The code is almost complete but I am stuck at a few places which are as
follows:-
1.) Calculating the upper bound of the weight for the get_maxpart( )
function
This one calculation has been giving me sleepless nights for
a couple of days now.The problem is that L is
a decreasing function for wdfn and P as per my calculations
is a increasing function . I arrived at this conclusion
because the derivative...
2011 Mar 08
1
MSet order
Hello
I defined a weighting scheme to simulate a king of "euclidean" distance.
To test it, i used a database with 1000 documents.
If I run :
enquire.set_weighting_scheme(MyWeight());
Xapian::MSet matches = enquire.get_mset(0, 1000);
I have a correct list of results.
But if I run Xapian::MSet matches = enquire.get_mset(0, 10);
I don't have the top-10 results.
If I run Xapian::MSet
2013 Aug 27
2
What does collection_freq means?
Hi, all:
I am confused with the concept of colletion_freq
There's no informations about it on http://xapian.org/docs/glossary.html
What does it means?
Thanks
Regards!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20130827/370cc6a3/attachment.html>
2012 Jul 17
1
Can not use custom weight scheme with python binding
...custom weight with python binding.
My test code is like this.
class TinkerWeight(xapian.Weight):
def __init__(self):
pass
def name(self):
return "Tinker"
def serialize(self):
return ""
def get_sumpart(*args):
return 1
def get_maxpart(*args):
return 1
def get_sumextra(*args):
return 0
def get_maxextra(*args):
return 0
... ...
enquire.set_weighting_scheme(TinkerWeight())
But is throws this error:
*in method 'Enquire_set_weighting_scheme', argument 2 of type
'Xapian::Weight const &a...
2012 Apr 15
1
Patch for Initial Prototype implementation of Unigram Langauage Modelling in xapian-core.
...f
document.Hence a random linear weight has been added.It need to be
addressed by using log diffrent bases and some other techniques.
Discussion about log trick needed to be used are here for reference:
http://comments.gmane.org/gmane.comp.search.xapian.devel/1857
2. Setting tighter bound for the get_maxpart() to make matching process
more efficient.
3. Adding other smoothing factors to the UnigramLMWeight implementation.
PFA 5 patches for the initial prototype implementation of Unigram Language
Model in Xapian.
Thanks,
--
with regards
Gaurav A.
-------------- next part --------------
An HTML att...
2009 Jan 27
1
Segmentation fault in MSetIterator get_weight
Hi,
I'm using xapian with c# and mono and i'm having a segfault in get_weight.
When i print the index variable, the value is clearly too high.
I think something write over it. Do you have any idea on how i could
trace the beginning of the segmentation fault ?
Thanks,
--
Yann
2020 Aug 23
2
MultiDatabase shard count limitations
...o.30.8.0 [.] GlassPostList::move_forward_in_chunk_to_at_least
1.76% script/public-i libxapian.so.30.8.0 [.] GlassPostListTable::get_freqs
1.71% script/public-i libxapian.so.30.8.0 [.] GlassTable::find_in_leaf
1.62% script/public-i libxapian.so.30.8.0 [.] Xapian::BM25Weight::get_maxpart
1.55% script/public-i libxapian.so.30.8.0 [.] Glass::compare<Glass::LeafItem, Glass::LeafItem>
1.44% script/public-i libc-2.28.so [.] malloc
1.32% script/public-i libxapian.so.30.8.0 [.] io_read_block
1.24% script/public-i libxapian.so.30.8.0 [.] GlassTa...
2020 Aug 21
2
MultiDatabase shard count limitations
Going back to the "prioritizing aggregated DBs" thread from
February 2020, I've got 390 Xapian shards for 130 public inboxes
I want to search against(*). There's more on the horizon (we're
expecting tens of thousands of public inboxes).
After bumping RLIMIT_NOFILE and running ->add_database a bunch,
the actual queries seem to be taking ~30s (not good :x).
Now I'm