thr3ads.net - similar to: "Backend for Lucene format indexes-How to get doclength"

Backend for Lucene format indexes-How to get doclength

2013 Jun 17

2

Backend for Lucene format indexes-How to get doclength

*Or do you mean that it's one number per document whereas the other stats are per database, so it's harder to store it?* yes, I mean this. It's a huge data. If a new doclength list(contains all the doclength in a list, like chert) is added by myself, I am concern about: 1. This doclength list may be the bottlenect in this backend, http://trac.xapian.org/ticket/326 2. Change too much

Backend for Lucene format indexes-How to get doclength

2013 Aug 25

2

Backend for Lucene format indexes-How to get doclength

On Tue, Aug 20, 2013 at 07:28:42PM +0800, jiangwen jiang wrote: > I think norm(t, d) in Lucene can used to caculate the number which is > similar to doc length(see norm(t,d) in > http://lucene.apache.org/core/3_5_0/api/all/org/apache/lucene/search/Similarity.html#formula_norm). It sounds similar (especially if document and field boosts aren't in use), though some places may rely on

Backend for Lucene format indexes-How to get doclength

2013 Aug 26

2

Backend for Lucene format indexes-How to get doclength

On Mon, Aug 26, 2013 at 09:41:07AM +0800, jiangwen jiang wrote: > > For now, using weighting schemes which don't use document length is > > probably the simplest answer. > > There's tf-idf weighting scheme on svn master, is it suitable for lucene > backend? Yes - TfIdfWeight doesn't ever use the document length (at least with the normalisations currently

Backend for Lucene format indexes-How to get doclength

2013 Sep 02

2

Backend for Lucene format indexes-How to get doclength

On Mon, Sep 02, 2013 at 09:21:48AM +0800, jiangwen jiang wrote: > TfIdfWeight and BM25(b=0) also need wdf_upper_bound, it is not exists in > Lucene backends. If you don't provide an implementation of wdf_upper_bound(), the default is to use the collection frequency of the term, so provided that information is available in the lucene files, the lack of wdf_upper_bound information

error building xapian

2005 Aug 12

1

error building xapian

I'm getting the following error when trying to build xapian. I've tried versions 0.9.1 and 0.9.2, same error. It's a x86 debian box, gcc 4.0.1. It builds fine on my gentoo amd64 box (gcc 3.4.3). Any ideas? Thanks, Alex make[3]: Leaving directory `/home/mcam/xapian-core-0.9.1/backends/flint' Making all in inmemory make[3]: Entering directory

Implementing tf-idf weighting scheme in Xapian

2013 Feb 19

2

Implementing tf-idf weighting scheme in Xapian

Hello guys.I just read up about tf-idf schemes and want to implement it in Xapian (with some frequently used normalizations) as it will also give me a good hang of implementing a weighting scheme before I start working on implementing DFR schemes. I read the following as references and I think Ive understood it well and can write the hack :- 1.)

Need Beginner Guide for Matcher Optimisations Project

2013 Mar 04

2

Need Beginner Guide for Matcher Optimisations Project

Hi, While searching for a project which matches my interest andskill level, I found this project named Matcher Optimization. This project is really challenging and excting from my view point and I would like to be a part of this project. Optimization techniques metioned in the reference links provided will take some time for me to have a good understanding about them. But I am trying to get my

[PATCH] Add a page cache-backed balloon device driver.

2012 Jun 26

6

[PATCH] Add a page cache-backed balloon device driver.

This implementation of a virtio balloon driver uses the page cache to "store" pages that have been released to the host. The communication (outside of target counts) is one way--the guest notifies the host when it adds a page to the page cache, allowing the host to madvise(2) with MADV_DONTNEED. Reclaim in the guest is therefore automatic and implicit (via the regular page reclaim).

[PATCH] Add a page cache-backed balloon device driver.

2012 Jun 26

6

[PATCH] Add a page cache-backed balloon device driver.

This implementation of a virtio balloon driver uses the page cache to "store" pages that have been released to the host. The communication (outside of target counts) is one way--the guest notifies the host when it adds a page to the page cache, allowing the host to madvise(2) with MADV_DONTNEED. Reclaim in the guest is therefore automatic and implicit (via the regular page reclaim).

Wish To Join Xapian:-)

2012 Mar 21

1

Wish To Join Xapian:-)

Dear Friends, This is Shao from National University of Singapore(NUS). I'm currently doing my exchange study in Royal Institute of Technology(KTH), Sweden. IR is really interesting to me. I've taken a Information Retrieval course during the exchange study here in KTH http://www.csc.kth.se/utbildning/kth/kurser/DD2476/ir12/labblydelser/assignment2. The Weighting Schemes and Learn to Rank

query time stemming and term weights

2005 Nov 16

1

query time stemming and term weights

I am developping a personal/desktop search tool for which I am experimenting with doing no stemming during the indexing, but instead having a stem database (or several for different languages), used for expanding the query terms at search time. (ie: user query: flooring -> stem: floor -> final query for: [floored flooring floorings floors]) I have thought of a possible problem with

Explanation of how Eset works

2013 Jan 09

2

Explanation of how Eset works

Hey guys hi.I am trying to understand how Xapian works .I read the Theoretical Background to Xapian doc and the report by Salton and Jones.I still cant seem to understand how Eset works How exactly does Xapian add terms to expand a query ? Assuming we have a list of the k most important terms, how do we decide which term to add to the query and will be in context with the query ? And to decide r

No subject

2012 Jul 25

0

No subject

pagecache for pages above lower limit but that is a separate question about driver design, I would like to make sure I understand the high level design first. > > > > Note that users could not care less about how a driver > > is implemented internally. > > > > Is there some workload where you see VM working better with > > this than regular balloon? Any

No subject

2012 Jul 25

0

No subject

pagecache for pages above lower limit but that is a separate question about driver design, I would like to make sure I understand the high level design first. > > > > Note that users could not care less about how a driver > > is implemented internally. > > > > Is there some workload where you see VM working better with > > this than regular balloon? Any

similar to: Backend for Lucene format indexes-How to get doclength