Displaying 14 results from an estimated 14 matches similar to: "Backend for Lucene format indexes-How to get doclength"
2013 Jun 17
2
Backend for Lucene format indexes-How to get doclength
*Or do you mean that it's one number per document whereas the other stats
are per database, so it's harder to store it?*
yes, I mean this. It's a huge data. If a new doclength list(contains all
the doclength in a list, like chert)
is added by myself, I am concern about:
1. This doclength list may be the bottlenect in this backend,
http://trac.xapian.org/ticket/326
2. Change too much
2013 Aug 25
2
Backend for Lucene format indexes-How to get doclength
On Tue, Aug 20, 2013 at 07:28:42PM +0800, jiangwen jiang wrote:
> I think norm(t, d) in Lucene can used to caculate the number which is
> similar to doc length(see norm(t,d) in
> http://lucene.apache.org/core/3_5_0/api/all/org/apache/lucene/search/Similarity.html#formula_norm).
It sounds similar (especially if document and field boosts aren't in use),
though some places may rely on
2013 Aug 26
2
Backend for Lucene format indexes-How to get doclength
On Mon, Aug 26, 2013 at 09:41:07AM +0800, jiangwen jiang wrote:
> > For now, using weighting schemes which don't use document length is
> > probably the simplest answer.
>
> There's tf-idf weighting scheme on svn master, is it suitable for lucene
> backend?
Yes - TfIdfWeight doesn't ever use the document length (at least with
the normalisations currently
2013 Sep 02
2
Backend for Lucene format indexes-How to get doclength
On Mon, Sep 02, 2013 at 09:21:48AM +0800, jiangwen jiang wrote:
> TfIdfWeight and BM25(b=0) also need wdf_upper_bound, it is not exists in
> Lucene backends.
If you don't provide an implementation of wdf_upper_bound(), the default
is to use the collection frequency of the term, so provided that
information is available in the lucene files, the lack of
wdf_upper_bound information
2005 Aug 12
1
error building xapian
I'm getting the following error when trying to build xapian. I've
tried versions 0.9.1 and 0.9.2, same error. It's a x86 debian box,
gcc 4.0.1. It builds fine on my gentoo amd64 box (gcc 3.4.3).
Any ideas?
Thanks,
Alex
make[3]: Leaving directory `/home/mcam/xapian-core-0.9.1/backends/flint'
Making all in inmemory
make[3]: Entering directory
2013 Feb 19
2
Implementing tf-idf weighting scheme in Xapian
Hello guys.I just read up about tf-idf schemes and want to implement it in
Xapian (with some frequently used normalizations) as it will also give me a
good hang of implementing a weighting scheme before I start working on
implementing DFR schemes.
I read the following as references and I think Ive understood it well and
can write the hack :-
1.)
2013 Mar 04
2
Need Beginner Guide for Matcher Optimisations Project
Hi,
While searching for a project which matches my interest andskill level, I
found this project named Matcher Optimization. This project is really
challenging and excting from my view point and I would like to be a part of
this project.
Optimization techniques metioned in the reference links provided will take
some time for me to have a good understanding about them. But I am trying
to get my
2012 Jun 26
6
[PATCH] Add a page cache-backed balloon device driver.
This implementation of a virtio balloon driver uses the page cache to
"store" pages that have been released to the host. The communication
(outside of target counts) is one way--the guest notifies the host when
it adds a page to the page cache, allowing the host to madvise(2) with
MADV_DONTNEED. Reclaim in the guest is therefore automatic and implicit
(via the regular page reclaim).
2012 Jun 26
6
[PATCH] Add a page cache-backed balloon device driver.
This implementation of a virtio balloon driver uses the page cache to
"store" pages that have been released to the host. The communication
(outside of target counts) is one way--the guest notifies the host when
it adds a page to the page cache, allowing the host to madvise(2) with
MADV_DONTNEED. Reclaim in the guest is therefore automatic and implicit
(via the regular page reclaim).
2012 Mar 21
1
Wish To Join Xapian:-)
Dear Friends,
This is Shao from National University of Singapore(NUS). I'm currently
doing my exchange study in Royal Institute of Technology(KTH), Sweden. IR
is really interesting to me. I've taken a Information Retrieval course
during the exchange study here in KTH
http://www.csc.kth.se/utbildning/kth/kurser/DD2476/ir12/labblydelser/assignment2.
The
Weighting Schemes and Learn to Rank
2005 Nov 16
1
query time stemming and term weights
I am developping a personal/desktop search tool for which I am
experimenting with doing no stemming during the indexing, but instead
having a stem database (or several for different languages), used for
expanding the query terms at search time.
(ie: user query: flooring -> stem: floor
-> final query for: [floored flooring floorings floors])
I have thought of a possible problem with
2013 Jan 09
2
Explanation of how Eset works
Hey guys hi.I am trying to understand how Xapian works .I read the
Theoretical Background to Xapian doc
and the report by Salton and Jones.I still cant seem to understand how Eset
works How exactly does Xapian add terms to expand a query ? Assuming we
have a list of the k most important terms, how do we decide which term to
add to the query and will be in context with the query ?
And to decide r
2012 Jul 25
0
No subject
pagecache for pages above lower limit but that
is a separate question about driver design,
I would like to make sure I understand the high
level design first.
> >
> > Note that users could not care less about how a driver
> > is implemented internally.
> >
> > Is there some workload where you see VM working better with
> > this than regular balloon? Any
2012 Jul 25
0
No subject
pagecache for pages above lower limit but that
is a separate question about driver design,
I would like to make sure I understand the high
level design first.
> >
> > Note that users could not care less about how a driver
> > is implemented internally.
> >
> > Is there some workload where you see VM working better with
> > this than regular balloon? Any