search for: bm25weight

Displaying 20 results from an estimated 26 matches for "bm25weight".

2017 Apr 09
3
Omega: Missing support for newer weighting schemes
...530, Vivek Pal wrote: > > Each scheme already has a human-readable name, and Xapian::Registry > > can map that to an "examplar" object of the right type, so we > > could take a string like "bm25 1 0.8", see the first word is "bm25" > > and get a BM25Weight object, then call parse_params("1 0.8") on it to > > create the correct Weight object (broadly similar to how unserialise() > > is handled). > > If I followed correctly, since the set_weighting_scheme method in > omega/weight.cc already does exactly that, do you sugg...
2017 Apr 08
2
Omega: Missing support for newer weighting schemes
...int that this functionality might belong in the API instead. Each scheme already has a human-readable name, and Xapian::Registry can map that to an "examplar" object of the right type, so we could take a string like "bm25 1 0.8", see the first word is "bm25" and get a BM25Weight object, then call parse_params("1 0.8") on it to create the correct Weight object (broadly similar to how unserialise() is handled). Then we can document the available schemes and the parameters they take in one place and refer to that from omega, quest and the evaluation module. Cheers...
2017 Apr 13
2
Omega: Missing support for newer weighting schemes
...::Weight::parse_params(scheme)); > > I wonder if we could do something more like: > > enq.set_weighting_scheme(Xapian::Registry::get_weighting_scheme(name).parse_params(params)); > > where, Xapian::Registry::get_weighting_scheme(name) returns a weighting scheme > object e.g. BM25Weight object and then calling parse_params method on that > to return a BM25Weight object now with parameters values as found in params > string. That doesn't work as written because you need a registry object to call get_weighting_scheme() on (it's not a static method). But the idea is t...
2013 Jan 17
1
FASTER Search
...g 2. parse the query to get a Xapian::Query 3. construct an Enquire for searching by calling get_mset method here is the function-time-cost for searching: samples % symbol name 75649 28.0401 ChertPostList::move_forward_in_chunk_to_at_least(unsigned int) 30118 11.1635 Xapian::BM25Weight::get_sumpart(unsigned int, unsigned int) const 21291 7.8917 AndMaybePostList::process_next_or_skip_to(double, Xapian::PostingIterator::Internal*) 17803 6.5989 OrPostList::next(double) 12481 4.6262 AndMaybePostList::get_weight() const 10729 3.9768 OrPostList::get_weight() const 1...
2013 Jun 16
3
Backend for Lucene format indexes-How to get doclength
Hi, all: I have wrote a demo patch for Backend for Lucene format indexes, Lucene version is 3.6.2. http://lucene.apache.org/core/3_6_2/fileformats.html Now, this demo patch just support the basic features in Lucene. Compound File(.cfs/.cfe)?term vector(.tvx/.tvd/.tvf) delete document(.del) are not supported, skip list in .fdx is not supported too example/quest.cc is used to test this demo.
2017 Apr 12
4
Omega: Missing support for newer weighting schemes
> Each scheme already has a human-readable name, and Xapian::Registry > can map that to an "examplar" object of the right type, so we > could take a string like "bm25 1 0.8", see the first word is "bm25" > and get a BM25Weight object, then call parse_params("1 0.8") on it to > create the correct Weight object (broadly similar to how unserialise() > is handled). Hi Olly -- the following piece of tested code in omega/weight.cc hopefully achieves what we intend to do. It works fine for all tests. Please let...
2013 Oct 23
2
performance on document.get_data()
...ata: json message which contains: author, url, message(30 words) Do you have any idea to improve the search performance , especially doc.get_data? my code snippet database = xapian.Database("%s/athena" % DATA_PATH) enquire = xapian.Enquire(database) enquire.set_weighting_scheme(xapian.BM25Weight()) query = parse(keywords) enquire.set_query(query) matches = enquire.get_mset(start, 200) matches.fetch() result = [json.loads(match.document.get_data()) for match in matches]
2013 Jun 17
2
Backend for Lucene format indexes-How to get doclength
...yself, I am concern about: 1. This doclength list may be the bottlenect in this backend, http://trac.xapian.org/ticket/326 2. Change too much above Lucene file format, then it's hard to compare performance between Xapian and Lucene Some ideas: 1. Using rank algorithm without doclength, such as BM25Weight or TradWeight without doclength, or tfidfWeight. If ranking results will be not good without doclength? 2. Stores doclength in .prx payload when doing Lucene indexing. https://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/index/Payload.html http://searchhub.org/2009/08/05/getting-...
2010 Nov 01
1
floating-point issues with set_sort_by_relevance_then_value? (1.2.3, BM25 k1=0)
...s, core-1.2.3 patched to r15140 and using chert. This also happens with complex queries where groups of results are expected to have identical weights. FIX: I found a simple fix for this issue, at least for my test cases: I added if (param_k1 == 0) RETURN(termweight); to the beginning of BM25Weight::get_sumpart in trunk/xapian-core/weight/bm25weight.cc:166 This apparently prevents floating point precision issues in the last line of get_sumpart() [which calculates termweight * wdf_double * 1 / wdf_double]. It also speeds up my case slightly. ;-) In order to prevent more such issues, it mi...
2016 Jun 10
2
Weighting Schemes -- Project Progress
Hello everyone, I have been working on adding support for BM25+ weighting function from the last couple of weeks. Initially, I considered modifying bm25weight.cc to add support for BM25+ function without disturbing functionalities of BM25. But that didn't work out very well. A day or two was spent trying to refactor and debug the same code. Later, I took another approach following the suggestions from James and implemented a new sub class (BM25PlusW...
2010 Aug 23
1
Sort ordering
Using MultiValueSorter, I can sort by key1, key2, relevance; or relevance, key1, key2. But AFAIK, I can't sort by key1, relevance, key2. Unless I spool out the entire result set or write some C++. I wonder if we need a new 'sort by' function that accepts any combination of keys and relevance in any order? The function would make it's own optimisations (ie is relevance first or
2008 Dec 17
1
using ValueWeightPostingSource
Hi, I'm currently using PostingSource to add some weight over the result using a value. I didn't find any documentation on how to use it with the query so i link a query constructed using the posting source and a query made using the query parser with an AND operator : Xapian.Query queryText = parser.ParseQuery("test:" + textBox1.Text + " DS:1 DS:2"); Xapian.Query
2012 Apr 20
1
Implementing the tf-idf weighting scheme
..._idfWeight and add a new file tf_idf.cc in ../weight in the repo, to implement Tf_idfWeight. Here is the git diff patch: https://gist.github.com/2422049 I think the next thing to do is register this scheme to Xapian and write some test to see whether or not it works? I'm grepped the current BM25Weight, TradWeight and BoolWeight, and find clues about Enquire::set_weighting_scheme( ). But something more should be done to understand it. Best, Jiuding -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/2012...
2013 Aug 27
2
What does collection_freq means?
Hi, all: I am confused with the concept of colletion_freq There's no informations about it on http://xapian.org/docs/glossary.html What does it means? Thanks Regards! -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20130827/370cc6a3/attachment.html>
2005 Dec 22
1
Xapian Binding compile error in Windows XP using CygWin
...k_repr': /cygdrive/d/AutomaticTextAnalysis-1.3-rc1/xapian-bindings/python/modern/xapian_w rap.cc:20297: undefined reference to `_PyString_FromString' Info: resolving vtable for Xapian::TradWeightby linking to __imp___ZTVN6Xapian10 TradWeightE (auto-import) Info: resolving vtable for Xapian::BM25Weightby linking to __imp___ZTVN6Xapian10 BM25WeightE (auto-import) collect2: ld returned 1 exit status make[4]: *** [_xapian.la] Error 1 make[4]: Leaving directory `/cygdrive/d/AutomaticTextAnalysis-1.3-rc1/xapian-bin dings/python' make[3]: *** [all-recursive] Error 1 make[3]: Leaving directory `/cyg...
2017 Apr 08
2
Omega: Missing support for newer weighting schemes
> It may be worth splitting that part of the $set documentation out into its > own section somehow, because it's getting a bit long - Undoubtedly; $set command has the longest section on the documentation page :) But it would be hard splitting that up because the documentation is organised in a way that each command is really contained in its own specific section. > and the details
2007 May 04
1
Last minute feature for 1.0.0
I'd like to draw people's attention to bug report #143 that I've just submitted. This is a proposal (and patch) to add the ability to store arbitrary metadata associated with a database (rather than with an individual document in the database). The rationale for this feature is explained more fully in the bug report, but briefly I've come across several situations where I
2020 Aug 23
2
MultiDatabase shard count limitations
...libxapian.so.30.8.0 [.] GlassPostList::move_forward_in_chunk_to_at_least 1.76% script/public-i libxapian.so.30.8.0 [.] GlassPostListTable::get_freqs 1.71% script/public-i libxapian.so.30.8.0 [.] GlassTable::find_in_leaf 1.62% script/public-i libxapian.so.30.8.0 [.] Xapian::BM25Weight::get_maxpart 1.55% script/public-i libxapian.so.30.8.0 [.] Glass::compare<Glass::LeafItem, Glass::LeafItem> 1.44% script/public-i libc-2.28.so [.] malloc 1.32% script/public-i libxapian.so.30.8.0 [.] io_read_block 1.24% script/public-i libxapian.so.30.8.0...
2020 Aug 21
2
MultiDatabase shard count limitations
Going back to the "prioritizing aggregated DBs" thread from February 2020, I've got 390 Xapian shards for 130 public inboxes I want to search against(*). There's more on the horizon (we're expecting tens of thousands of public inboxes). After bumping RLIMIT_NOFILE and running ->add_database a bunch, the actual queries seem to be taking ~30s (not good :x). Now I'm
2017 Sep 28
1
Weighting the author of a doc when that term can also appear as a frequent term in other docs
We have a corpus of academic papers. Sometimes it happens that there is an academic controversy and one paper is a response or rebuttal to another paper. The name of the author of the first paper may appear many times in the second paper. So in light of this, how should we set our weight on the author field? Here is an example: http://www.nber.org/papers/w11215  in which the term