Displaying 20 results from an estimated 26 matches for "bm25weight".
2017 Apr 09
3
Omega: Missing support for newer weighting schemes
...530, Vivek Pal wrote:
> > Each scheme already has a human-readable name, and Xapian::Registry
> > can map that to an "examplar" object of the right type, so we
> > could take a string like "bm25 1 0.8", see the first word is "bm25"
> > and get a BM25Weight object, then call parse_params("1 0.8") on it to
> > create the correct Weight object (broadly similar to how unserialise()
> > is handled).
>
> If I followed correctly, since the set_weighting_scheme method in
> omega/weight.cc already does exactly that, do you sugg...
2017 Apr 08
2
Omega: Missing support for newer weighting schemes
...int that this functionality might belong
in the API instead.
Each scheme already has a human-readable name, and Xapian::Registry
can map that to an "examplar" object of the right type, so we
could take a string like "bm25 1 0.8", see the first word is "bm25"
and get a BM25Weight object, then call parse_params("1 0.8") on it to
create the correct Weight object (broadly similar to how unserialise()
is handled).
Then we can document the available schemes and the parameters they
take in one place and refer to that from omega, quest and the evaluation
module.
Cheers...
2017 Apr 13
2
Omega: Missing support for newer weighting schemes
...::Weight::parse_params(scheme));
>
> I wonder if we could do something more like:
>
> enq.set_weighting_scheme(Xapian::Registry::get_weighting_scheme(name).parse_params(params));
>
> where, Xapian::Registry::get_weighting_scheme(name) returns a weighting scheme
> object e.g. BM25Weight object and then calling parse_params method on that
> to return a BM25Weight object now with parameters values as found in params
> string.
That doesn't work as written because you need a registry object to call
get_weighting_scheme() on (it's not a static method).
But the idea is t...
2013 Jan 17
1
FASTER Search
...g
2. parse the query to get a Xapian::Query
3. construct an Enquire for searching by calling get_mset method
here is the function-time-cost for searching:
samples % symbol name
75649 28.0401 ChertPostList::move_forward_in_chunk_to_at_least(unsigned
int)
30118 11.1635 Xapian::BM25Weight::get_sumpart(unsigned int, unsigned
int) const
21291 7.8917 AndMaybePostList::process_next_or_skip_to(double,
Xapian::PostingIterator::Internal*)
17803 6.5989 OrPostList::next(double)
12481 4.6262 AndMaybePostList::get_weight() const
10729 3.9768 OrPostList::get_weight() const
1...
2013 Jun 16
3
Backend for Lucene format indexes-How to get doclength
Hi, all:
I have wrote a demo patch for Backend for Lucene format indexes, Lucene
version is 3.6.2.
http://lucene.apache.org/core/3_6_2/fileformats.html
Now, this demo patch just support the basic features in Lucene. Compound
File(.cfs/.cfe)?term vector(.tvx/.tvd/.tvf)
delete document(.del) are not supported, skip list in .fdx is not supported
too
example/quest.cc is used to test this demo.
2017 Apr 12
4
Omega: Missing support for newer weighting schemes
> Each scheme already has a human-readable name, and Xapian::Registry
> can map that to an "examplar" object of the right type, so we
> could take a string like "bm25 1 0.8", see the first word is "bm25"
> and get a BM25Weight object, then call parse_params("1 0.8") on it to
> create the correct Weight object (broadly similar to how unserialise()
> is handled).
Hi Olly -- the following piece of tested code in omega/weight.cc hopefully
achieves what we intend to do. It works fine for all tests. Please let...
2013 Oct 23
2
performance on document.get_data()
...ata: json message which contains: author, url, message(30 words)
Do you have any idea to improve the search performance , especially
doc.get_data?
my code snippet
database = xapian.Database("%s/athena" % DATA_PATH)
enquire = xapian.Enquire(database)
enquire.set_weighting_scheme(xapian.BM25Weight())
query = parse(keywords)
enquire.set_query(query)
matches = enquire.get_mset(start, 200)
matches.fetch()
result = [json.loads(match.document.get_data()) for match in matches]
2013 Jun 17
2
Backend for Lucene format indexes-How to get doclength
...yself, I am concern about:
1. This doclength list may be the bottlenect in this backend,
http://trac.xapian.org/ticket/326
2. Change too much above Lucene file format, then it's hard to compare
performance between Xapian and Lucene
Some ideas:
1. Using rank algorithm without doclength, such as BM25Weight or TradWeight
without doclength, or tfidfWeight.
If ranking results will be not good without doclength?
2. Stores doclength in .prx payload when doing Lucene indexing.
https://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/index/Payload.html
http://searchhub.org/2009/08/05/getting-...
2010 Nov 01
1
floating-point issues with set_sort_by_relevance_then_value? (1.2.3, BM25 k1=0)
...s, core-1.2.3 patched
to r15140 and using chert. This also happens with complex queries where groups
of results are expected to have identical weights.
FIX: I found a simple fix for this issue, at least for my test cases:
I added
if (param_k1 == 0) RETURN(termweight);
to the beginning of BM25Weight::get_sumpart in
trunk/xapian-core/weight/bm25weight.cc:166
This apparently prevents floating point precision issues in the last line of
get_sumpart() [which calculates termweight * wdf_double * 1 / wdf_double]. It
also speeds up my case slightly. ;-)
In order to prevent more such issues, it mi...
2016 Jun 10
2
Weighting Schemes -- Project Progress
Hello everyone,
I have been working on adding support for BM25+ weighting function from the
last couple of weeks. Initially, I considered modifying bm25weight.cc to
add support for BM25+ function without disturbing functionalities of BM25.
But that didn't work out very well. A day or two was spent trying to
refactor and debug the same code.
Later, I took another approach following the suggestions from James and
implemented a new sub class (BM25PlusW...
2010 Aug 23
1
Sort ordering
Using MultiValueSorter, I can sort by key1, key2, relevance; or relevance, key1, key2.
But AFAIK, I can't sort by key1, relevance, key2. Unless I spool out the entire result set or write some C++.
I wonder if we need a new 'sort by' function that accepts any combination of keys and relevance in any order? The function would make it's own optimisations (ie is relevance first or
2008 Dec 17
1
using ValueWeightPostingSource
Hi,
I'm currently using PostingSource to add some weight over the result
using a value.
I didn't find any documentation on how to use it with the query so i
link a query constructed using the posting source and a query made
using the query parser with an AND operator :
Xapian.Query queryText = parser.ParseQuery("test:" + textBox1.Text + "
DS:1 DS:2");
Xapian.Query
2012 Apr 20
1
Implementing the tf-idf weighting scheme
..._idfWeight and add a new
file tf_idf.cc in ../weight in the repo, to implement Tf_idfWeight.
Here is the git diff patch:
https://gist.github.com/2422049
I think the next thing to do is register this scheme to Xapian and write
some test to see whether or not it works?
I'm grepped the current BM25Weight, TradWeight and BoolWeight, and find
clues about Enquire::set_weighting_scheme( ). But something more should be
done to understand it.
Best,
Jiuding
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/2012...
2013 Aug 27
2
What does collection_freq means?
Hi, all:
I am confused with the concept of colletion_freq
There's no informations about it on http://xapian.org/docs/glossary.html
What does it means?
Thanks
Regards!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20130827/370cc6a3/attachment.html>
2005 Dec 22
1
Xapian Binding compile error in Windows XP using CygWin
...k_repr':
/cygdrive/d/AutomaticTextAnalysis-1.3-rc1/xapian-bindings/python/modern/xapian_w
rap.cc:20297: undefined reference to `_PyString_FromString'
Info: resolving vtable for Xapian::TradWeightby linking to __imp___ZTVN6Xapian10
TradWeightE (auto-import)
Info: resolving vtable for Xapian::BM25Weightby linking to __imp___ZTVN6Xapian10
BM25WeightE (auto-import)
collect2: ld returned 1 exit status
make[4]: *** [_xapian.la] Error 1
make[4]: Leaving directory `/cygdrive/d/AutomaticTextAnalysis-1.3-rc1/xapian-bin
dings/python'
make[3]: *** [all-recursive] Error 1
make[3]: Leaving directory `/cyg...
2017 Apr 08
2
Omega: Missing support for newer weighting schemes
> It may be worth splitting that part of the $set documentation out into its
> own section somehow, because it's getting a bit long -
Undoubtedly; $set command has the longest section on the documentation page :)
But it would be hard splitting that up because the documentation is organised
in a way that each command is really contained in its own specific section.
> and the details
2007 May 04
1
Last minute feature for 1.0.0
I'd like to draw people's attention to bug report #143 that I've just
submitted. This is a proposal (and patch) to add the ability to store
arbitrary metadata associated with a database (rather than with an
individual document in the database). The rationale for this feature is
explained more fully in the bug report, but briefly I've come across
several situations where I
2020 Aug 23
2
MultiDatabase shard count limitations
...libxapian.so.30.8.0 [.] GlassPostList::move_forward_in_chunk_to_at_least
1.76% script/public-i libxapian.so.30.8.0 [.] GlassPostListTable::get_freqs
1.71% script/public-i libxapian.so.30.8.0 [.] GlassTable::find_in_leaf
1.62% script/public-i libxapian.so.30.8.0 [.] Xapian::BM25Weight::get_maxpart
1.55% script/public-i libxapian.so.30.8.0 [.] Glass::compare<Glass::LeafItem, Glass::LeafItem>
1.44% script/public-i libc-2.28.so [.] malloc
1.32% script/public-i libxapian.so.30.8.0 [.] io_read_block
1.24% script/public-i libxapian.so.30.8.0...
2020 Aug 21
2
MultiDatabase shard count limitations
Going back to the "prioritizing aggregated DBs" thread from
February 2020, I've got 390 Xapian shards for 130 public inboxes
I want to search against(*). There's more on the horizon (we're
expecting tens of thousands of public inboxes).
After bumping RLIMIT_NOFILE and running ->add_database a bunch,
the actual queries seem to be taking ~30s (not good :x).
Now I'm
2017 Sep 28
1
Weighting the author of a doc when that term can also appear as a frequent term in other docs
We have a corpus of academic papers. Sometimes it happens that there is
an academic controversy and one paper is a response or rebuttal to
another paper. The name of the author of the first paper may appear many
times in the second paper. So in light of this, how should we set our
weight on the author field?
Here is an example:
http://www.nber.org/papers/w11215
in which the term