search for: doclength

Displaying 6 results from an estimated 6 matches for "doclength".

2013 Jun 17
2
Backend for Lucene format indexes-How to get doclength
*Or do you mean that it's one number per document whereas the other stats are per database, so it's harder to store it?* yes, I mean this. It's a huge data. If a new doclength list(contains all the doclength in a list, like chert) is added by myself, I am concern about: 1. This doclength list may be the bottlenect in this backend, http://trac.xapian.org/ticket/326 2. Change too much above Lucene file format, then it's hard to compare performance between Xapian and Lu...
2013 Jun 16
3
Backend for Lucene format indexes-How to get doclength
...vx/.tvd/.tvf) delete document(.del) are not supported, skip list in .fdx is not supported too example/quest.cc is used to test this demo. query like this: field_name:term, or file_name:term1 AND field_name:term2 Until now, I found some data needed for BM25 in Xapian are not existed in Lucene: 1. doclength_lower_bound?doclength_upper_bound 2. wdf_lower_bound?wdf_uppper_bound 3. total_length 4. doclength(for each document) 1-3 are statistics data, can be caculated when doing copydatabase, and store them in somewhere. But doclengh is hard to do this way. 1. some other data instead of doclength? 2. Xap...
2013 Aug 26
2
Backend for Lucene format indexes-How to get doclength
On Mon, Aug 26, 2013 at 09:41:07AM +0800, jiangwen jiang wrote: > > For now, using weighting schemes which don't use document length is > > probably the simplest answer. > > There's tf-idf weighting scheme on svn master, is it suitable for lucene > backend? Yes - TfIdfWeight doesn't ever use the document length (at least with the normalisations currently
2013 Sep 02
2
Backend for Lucene format indexes-How to get doclength
On Mon, Sep 02, 2013 at 09:21:48AM +0800, jiangwen jiang wrote: > TfIdfWeight and BM25(b=0) also need wdf_upper_bound, it is not exists in > Lucene backends. If you don't provide an implementation of wdf_upper_bound(), the default is to use the collection frequency of the term, so provided that information is available in the lucene files, the lack of wdf_upper_bound information
2013 Aug 25
2
Backend for Lucene format indexes-How to get doclength
On Tue, Aug 20, 2013 at 07:28:42PM +0800, jiangwen jiang wrote: > I think norm(t, d) in Lucene can used to caculate the number which is > similar to doc length(see norm(t,d) in > http://lucene.apache.org/core/3_5_0/api/all/org/apache/lucene/search/Similarity.html#formula_norm). It sounds similar (especially if document and field boosts aren't in use), though some places may rely on
2005 Aug 12
1
error building xapian
...'InMemoryPostList::InMemoryPostList(Xapian::Internal::RefCntPtr<const InMemoryDatabase>, const InMemoryTerm&)': inmemory_database.cc:84: error: class 'InMemoryPostList' does not have any field named 'db' inmemory_database.cc: In member function 'virtual Xapian::doclength InMemoryPostList::get_doclength() const': inmemory_database.cc:153: error: 'db' was not declared in this scope inmemory_database.cc: At global scope: inmemory_database.cc:182: error: prototype for 'InMemoryTermList::InMemoryTermList(Xapian::Internal::RefCntPtr<const InMemoryDatab...