search for: get_maxpart

Displaying 9 results from an estimated 9 matches for "get_maxpart".

2013 Feb 19
2
Implementing tf-idf weighting scheme in Xapian
...is the weight given by the term to the document. The basic formula is W(t,d)=wdf* log(N/termfreq) . However,various normalizations can be applied to both wdf and idf. The extra per document component will be 0 here and so get_maxextra( ) will return 0 . Moreover,an upper bound on W(t,d) for get_maxpart( ) can be found out easily for a particular normalization (if I have all the required metrics available). For eg:- If I am using logarithmic normalization for the wdf (within document frequency) ,then an upper bound on W(t,d) will be (log(wdf_upperbound_)+1)*log(N/termfreq) as N(collection size)...
2013 Mar 11
1
Implementation of the PL2 weighting scheme of the DFR Framework
...oisson distrubution = Collection frequency of the term / Size of the database and the base of all logarithms is 2. c is a constant parameter The code is almost complete but I am stuck at a few places which are as follows:- 1.) Calculating the upper bound of the weight for the get_maxpart( ) function This one calculation has been giving me sleepless nights for a couple of days now.The problem is that L is a decreasing function for wdfn and P as per my calculations is a increasing function . I arrived at this conclusion because the derivative...
2011 Mar 08
1
MSet order
Hello I defined a weighting scheme to simulate a king of "euclidean" distance. To test it, i used a database with 1000 documents. If I run : enquire.set_weighting_scheme(MyWeight()); Xapian::MSet matches = enquire.get_mset(0, 1000); I have a correct list of results. But if I run Xapian::MSet matches = enquire.get_mset(0, 10); I don't have the top-10 results. If I run Xapian::MSet
2013 Aug 27
2
What does collection_freq means?
Hi, all: I am confused with the concept of colletion_freq There's no informations about it on http://xapian.org/docs/glossary.html What does it means? Thanks Regards! -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20130827/370cc6a3/attachment.html>
2012 Jul 17
1
Can not use custom weight scheme with python binding
...custom weight with python binding. My test code is like this. class TinkerWeight(xapian.Weight): def __init__(self): pass def name(self): return "Tinker" def serialize(self): return "" def get_sumpart(*args): return 1 def get_maxpart(*args): return 1 def get_sumextra(*args): return 0 def get_maxextra(*args): return 0 ... ... enquire.set_weighting_scheme(TinkerWeight()) But is throws this error: *in method 'Enquire_set_weighting_scheme', argument 2 of type 'Xapian::Weight const &a...
2012 Apr 15
1
Patch for Initial Prototype implementation of Unigram Langauage Modelling in xapian-core.
...f document.Hence a random linear weight has been added.It need to be addressed by using log diffrent bases and some other techniques. Discussion about log trick needed to be used are here for reference: http://comments.gmane.org/gmane.comp.search.xapian.devel/1857 2. Setting tighter bound for the get_maxpart() to make matching process more efficient. 3. Adding other smoothing factors to the UnigramLMWeight implementation. PFA 5 patches for the initial prototype implementation of Unigram Language Model in Xapian. Thanks, -- with regards Gaurav A. -------------- next part -------------- An HTML att...
2009 Jan 27
1
Segmentation fault in MSetIterator get_weight
Hi, I'm using xapian with c# and mono and i'm having a segfault in get_weight. When i print the index variable, the value is clearly too high. I think something write over it. Do you have any idea on how i could trace the beginning of the segmentation fault ? Thanks, -- Yann
2020 Aug 23
2
MultiDatabase shard count limitations
...o.30.8.0 [.] GlassPostList::move_forward_in_chunk_to_at_least 1.76% script/public-i libxapian.so.30.8.0 [.] GlassPostListTable::get_freqs 1.71% script/public-i libxapian.so.30.8.0 [.] GlassTable::find_in_leaf 1.62% script/public-i libxapian.so.30.8.0 [.] Xapian::BM25Weight::get_maxpart 1.55% script/public-i libxapian.so.30.8.0 [.] Glass::compare<Glass::LeafItem, Glass::LeafItem> 1.44% script/public-i libc-2.28.so [.] malloc 1.32% script/public-i libxapian.so.30.8.0 [.] io_read_block 1.24% script/public-i libxapian.so.30.8.0 [.] GlassTa...
2020 Aug 21
2
MultiDatabase shard count limitations
Going back to the "prioritizing aggregated DBs" thread from February 2020, I've got 390 Xapian shards for 130 public inboxes I want to search against(*). There's more on the horizon (we're expecting tens of thousands of public inboxes). After bumping RLIMIT_NOFILE and running ->add_database a bunch, the actual queries seem to be taking ~30s (not good :x). Now I'm