similar to: Quickest way to retrieve data for a large match set?

Displaying 20 results from an estimated 2000 matches similar to: "Quickest way to retrieve data for a large match set?"

2013 Jun 19
2
Compact databases and removing stale records at the same time
I'm trying to compact (or at least merge) multiple databases, while stripping search records which are no longer required. Backstory: I've inherited the Cyrus IMAPd xapian-based search code from Greg Banks when he left Opera. One of the unfinished parts was removing expunged emails from the search database. We moved from having a single search database to supporting multiple
2015 Mar 11
2
stub-file and get_doccount
Hello, i switched from one big index to a stub file with many indexes and running into a problem. i have a tool to fetch a random document via: get_doccount random id up to get_doccount get_document with that id after changing to stub file this failes. Is there a nice way to get a random document from a stub file? ?MfG? Felix Ostmann
2014 Apr 13
2
Adding an external library to Xapian
My code is not on Github. I am using the tarball as of now. The following it the error that occurred: http://pastebin.com/cVJrjUZX On Sun, Apr 13, 2014 at 8:16 PM, James Aylett <james-xapian at tartarus.org>wrote: > On 13 Apr 2014, at 15:37, Pallavi Gudipati <pallavigudipati at gmail.com> > wrote: > > > A linker error is encountered even after following the above
2013 Aug 26
2
Perl interface isn't working in 1.2.x
On 08/25/2013 05:02 PM, Olly Betts wrote: > So the simple fix is > probably just to install the perl-Search-Xapian RPM instead. Thanks, the Centos 6 repos don't have that rpm and the http://xapian.org/download page seems to only cover the XS bindings, if I am reading this correctly: But I was able to remove the rpm packages and compile and install the core and swig from source.
2007 Feb 09
1
Fetching document content by Q term in Python
Hello, I'd like to be able to retrieve the indexes stored copy of the document text and tried the following: terms = self.db.allterms() terms.skip_to('Q' + uri.encode('utf-8')) term = terms.next() doc = self.db.get_document(term[1]) print doc.get_data() I just wildly guessed that [1] was the docid, but of course it isn't. So the question is, how do I
2012 Jan 20
3
get_docid???
my $mset = $enq->get_mset($nstart,$nrecords); for(my $mit=$mset->begin(); $mit != $mset->end();$mit++) { my $doc = $mit->get_document(); my $dat = $doc->get_data(); my $id = $doc->get_docid(); } [Fri Jan 20 10:35:06 2012] newmail.cgi: Can't locate auto/Search/Xapian/Document/get_docid.al in @INC (@INC contains: /etc/perl
2013 Jun 19
2
Compact databases and removing stale records at the same time
On Wed, Jun 19, 2013, at 03:49 PM, Olly Betts wrote: > On Wed, Jun 19, 2013 at 01:29:16PM +1000, Bron Gondwana wrote: > > The advantage of compact - it runs approximately 8 times as fast (we > > are CPU limited in each case - writing to tmpfs first, then rsyncing > > to the destination) and it takes approximately 75% of the space of a > > fresh database with maximum
2011 Apr 21
1
How to Retrieve content of the document?
Hi, I have just started using xapian and I may sound like a noob. I want to know how i can access the content of the document retrieved while searching. I have used the code found on this mailing list itself to index my database. #!/usr/bin/perl -w use strict; use Search::Xapian; use File::Find; my $DATABASE_DIR = '/home/rohit/Desktop/SET/DB'; my $db =
2013 Oct 23
2
performance on document.get_data()
I got some performance issue for document.get_data() and enquire.get_mset(). It costs 35 seconds for matches = enquire.get_mset(0,200), and 3 seconds for iterating all doc in matches to get_data. Is't normal? My index contains 30millions documents. I use python binding to operate xapian. Bellow it's my index structure # value: 0:date, 1:site # data: json message which contains: author,
2007 Nov 08
1
Perl make test fails on threads in rhel5
Hi all, I've tried building RPMs for RHEL5 and hit this problem in Search::Xapian: make test fails on test 37: ok 34 - check PositionIterator ok 35 - create TermIterator ok 36 - check TermIterator dubious Test returned status 0 (wstat 11, 0xb) DIED. FAILED tests 37-65 Failed 29/65 tests, 55.38% okay $ xapian-config --version xapian-config - xapian-core 1.0.4 $ cat
2008 Sep 27
3
Query::MatchAll
Why there still been rank when using Query::MatchAll() ?
2013 Aug 21
2
Perl interface isn't working in 1.2.x
At least it isn't working the way it used to. Code: $db = Search::Xapian::Database->new( $dx ); my $qp = Search::Xapian::QueryParser->new(); my $dbSize=$db->get_doccount(); # $qp->set_stemmer(new Search::Xapian::Stem("english")); # $qp->set_stemming_strategy(STEM_SOME); # $qp->set_default_op($defaultop); my $par =
2007 Feb 09
1
PHP Binding and dbi2omega questions
Hi All, I've installed Xapian and the php module. I've set up a script for use with scriptindex and dbi2omega for getting data from the db into the index easily, the script file is as follows: =============================== id : field=id title : index title: field=title description : index description : truncate=50 field=content ============================= However, when querying
2018 Nov 30
1
Xapian Benchmark results
Hi, I am currently trying to benchmark a multithreaded xapian implementation on a chameleon baremetal instance written in C++. My workload is a 3 Gig wikipedia xml dump consisting of ~286 file of different sizes. My results are showing me that indexing on xapian is an order of magnitude faster than my lucene and lucene plusplus implementations. This is a result that I did not expect. Just want to
2006 Nov 30
1
PHP / XapianQueryParser
Hi everyone, I tried sending a message as a reply a while back on my previous topic, but it didn't go through. (Tried Gmane), not even when I 'authorized' the reply. So I'll just paste it here for reference, below this message. It might help some people. But now I have one other small problem, and I'm not sure if it is actually my mistake (although I'm pretty sure it is
2013 Sep 22
2
How to filter search result with query with has white space.
Hello, include <iostream>#include <string>#include <xapian.h>struct document{ std::string title; std::string content; std::string url;}; void indexData(document d) { try { Xapian::WritableDatabase db("/Users/ramesh/Desktop/xapian", Xapian::DB_CREATE_OR_OPEN); Xapian::TermGenerator indexer; Xapian::Stem
2013 Sep 22
2
How to filter search result with query with has white space.
Hello, include <iostream>#include <string>#include <xapian.h>struct document{ std::string title; std::string content; std::string url;}; void indexData(document d) { try { Xapian::WritableDatabase db("/Users/ramesh/Desktop/xapian", Xapian::DB_CREATE_OR_OPEN); Xapian::TermGenerator indexer; Xapian::Stem
2006 Dec 06
1
Bug and patch for +terms with wildcards
In current Xapian SVN HEAD, there is a bug in the query parser concerned with the handling of wildcard terms with a "+" prefix. Specifically, a query such as "+foo* bar" will be parsed by the query parser into Xapian::Query("bar") if there are no terms in the database which start "foo". Instead, since the "+" term cannot be matched, I believe
2023 Aug 18
1
does Xapian::Enquire hold an MVCC revision?
On Thu, Aug 17, 2023 at 09:28:26PM +0000, Eric Wong wrote: > In other words, is it possible to avoid duplicates if new > documents are inserted into the DB by another process in-between > ->get_mset calls when reusing Xapian::Enquire objects? The Database object itself effectively does (it works in a snapshot of the state of the database when you open it, or last called reopen() which
2012 Feb 17
2
DatabaseModifiedError on get_data - best practice?
Hi, I have previously had a problem with getting this error on a get_mset call, and solved it by subclassing XapianEnquire with a backoff-and-retry algorithm (as suggested by this list, many thanks!). However, I now get it intermittently when calling get_data on a XapianDocument. The same solution doesn't seem to be quite as easy in this case, because: 1. The document is not instantiated