similar to: Backend for Lucene format indexes-How to get doclength

Displaying 14 results from an estimated 14 matches similar to: "Backend for Lucene format indexes-How to get doclength"

2013 Jun 17
2
Backend for Lucene format indexes-How to get doclength
*Or do you mean that it's one number per document whereas the other stats are per database, so it's harder to store it?* yes, I mean this. It's a huge data. If a new doclength list(contains all the doclength in a list, like chert) is added by myself, I am concern about: 1. This doclength list may be the bottlenect in this backend, http://trac.xapian.org/ticket/326 2. Change too much
2013 Aug 25
2
Backend for Lucene format indexes-How to get doclength
On Tue, Aug 20, 2013 at 07:28:42PM +0800, jiangwen jiang wrote: > I think norm(t, d) in Lucene can used to caculate the number which is > similar to doc length(see norm(t,d) in > http://lucene.apache.org/core/3_5_0/api/all/org/apache/lucene/search/Similarity.html#formula_norm). It sounds similar (especially if document and field boosts aren't in use), though some places may rely on
2013 Aug 26
2
Backend for Lucene format indexes-How to get doclength
On Mon, Aug 26, 2013 at 09:41:07AM +0800, jiangwen jiang wrote: > > For now, using weighting schemes which don't use document length is > > probably the simplest answer. > > There's tf-idf weighting scheme on svn master, is it suitable for lucene > backend? Yes - TfIdfWeight doesn't ever use the document length (at least with the normalisations currently
2013 Sep 02
2
Backend for Lucene format indexes-How to get doclength
On Mon, Sep 02, 2013 at 09:21:48AM +0800, jiangwen jiang wrote: > TfIdfWeight and BM25(b=0) also need wdf_upper_bound, it is not exists in > Lucene backends. If you don't provide an implementation of wdf_upper_bound(), the default is to use the collection frequency of the term, so provided that information is available in the lucene files, the lack of wdf_upper_bound information
2005 Aug 12
1
error building xapian
I'm getting the following error when trying to build xapian. I've tried versions 0.9.1 and 0.9.2, same error. It's a x86 debian box, gcc 4.0.1. It builds fine on my gentoo amd64 box (gcc 3.4.3). Any ideas? Thanks, Alex make[3]: Leaving directory `/home/mcam/xapian-core-0.9.1/backends/flint' Making all in inmemory make[3]: Entering directory
2013 Feb 19
2
Implementing tf-idf weighting scheme in Xapian
Hello guys.I just read up about tf-idf schemes and want to implement it in Xapian (with some frequently used normalizations) as it will also give me a good hang of implementing a weighting scheme before I start working on implementing DFR schemes. I read the following as references and I think Ive understood it well and can write the hack :- 1.)
2013 Mar 04
2
Need Beginner Guide for Matcher Optimisations Project
Hi, While searching for a project which matches my interest andskill level, I found this project named Matcher Optimization. This project is really challenging and excting from my view point and I would like to be a part of this project. Optimization techniques metioned in the reference links provided will take some time for me to have a good understanding about them. But I am trying to get my
2012 Jun 26
6
[PATCH] Add a page cache-backed balloon device driver.
This implementation of a virtio balloon driver uses the page cache to "store" pages that have been released to the host. The communication (outside of target counts) is one way--the guest notifies the host when it adds a page to the page cache, allowing the host to madvise(2) with MADV_DONTNEED. Reclaim in the guest is therefore automatic and implicit (via the regular page reclaim).
2012 Jun 26
6
[PATCH] Add a page cache-backed balloon device driver.
This implementation of a virtio balloon driver uses the page cache to "store" pages that have been released to the host. The communication (outside of target counts) is one way--the guest notifies the host when it adds a page to the page cache, allowing the host to madvise(2) with MADV_DONTNEED. Reclaim in the guest is therefore automatic and implicit (via the regular page reclaim).
2012 Mar 21
1
Wish To Join Xapian:-)
Dear Friends, This is Shao from National University of Singapore(NUS). I'm currently doing my exchange study in Royal Institute of Technology(KTH), Sweden. IR is really interesting to me. I've taken a Information Retrieval course during the exchange study here in KTH http://www.csc.kth.se/utbildning/kth/kurser/DD2476/ir12/labblydelser/assignment2. The Weighting Schemes and Learn to Rank
2005 Nov 16
1
query time stemming and term weights
I am developping a personal/desktop search tool for which I am experimenting with doing no stemming during the indexing, but instead having a stem database (or several for different languages), used for expanding the query terms at search time. (ie: user query: flooring -> stem: floor -> final query for: [floored flooring floorings floors]) I have thought of a possible problem with
2013 Jan 09
2
Explanation of how Eset works
Hey guys hi.I am trying to understand how Xapian works .I read the Theoretical Background to Xapian doc and the report by Salton and Jones.I still cant seem to understand how Eset works How exactly does Xapian add terms to expand a query ? Assuming we have a list of the k most important terms, how do we decide which term to add to the query and will be in context with the query ? And to decide r
2012 Jul 25
0
No subject
pagecache for pages above lower limit but that is a separate question about driver design, I would like to make sure I understand the high level design first. > > > > Note that users could not care less about how a driver > > is implemented internally. > > > > Is there some workload where you see VM working better with > > this than regular balloon? Any
2012 Jul 25
0
No subject
pagecache for pages above lower limit but that is a separate question about driver design, I would like to make sure I understand the high level design first. > > > > Note that users could not care less about how a driver > > is implemented internally. > > > > Is there some workload where you see VM working better with > > this than regular balloon? Any