Displaying 20 results from an estimated 2000 matches similar to: "Quickest way to retrieve data for a large match set?"
2013 Jun 19
2
Compact databases and removing stale records at the same time
I'm trying to compact (or at least merge) multiple databases, while stripping search records which are no longer required.
Backstory:
I've inherited the Cyrus IMAPd xapian-based search code from Greg Banks when he left Opera.
One of the unfinished parts was removing expunged emails from the search database.
We moved from having a single search database to supporting multiple
2015 Mar 11
2
stub-file and get_doccount
Hello,
i switched from one big index to a stub file with many indexes and running
into a problem.
i have a tool to fetch a random document via:
get_doccount
random id up to get_doccount
get_document with that id
after changing to stub file this failes. Is there a nice way to get a
random document from a stub file?
?MfG?
Felix Ostmann
2014 Apr 13
2
Adding an external library to Xapian
My code is not on Github. I am using the tarball as of now. The following
it the error that occurred:
http://pastebin.com/cVJrjUZX
On Sun, Apr 13, 2014 at 8:16 PM, James Aylett <james-xapian at tartarus.org>wrote:
> On 13 Apr 2014, at 15:37, Pallavi Gudipati <pallavigudipati at gmail.com>
> wrote:
>
> > A linker error is encountered even after following the above
2013 Aug 26
2
Perl interface isn't working in 1.2.x
On 08/25/2013 05:02 PM, Olly Betts wrote:
> So the simple fix is
> probably just to install the perl-Search-Xapian RPM instead.
Thanks, the Centos 6 repos don't have that rpm and the
http://xapian.org/download page seems to only cover the XS bindings, if
I am reading this correctly:
But I was able to remove the rpm packages and compile and install the
core and swig from source.
2007 Feb 09
1
Fetching document content by Q term in Python
Hello,
I'd like to be able to retrieve the indexes stored copy of the document
text and tried the following:
terms = self.db.allterms()
terms.skip_to('Q' + uri.encode('utf-8'))
term = terms.next()
doc = self.db.get_document(term[1])
print doc.get_data()
I just wildly guessed that [1] was the docid, but of course it isn't. So the
question is, how do I
2012 Jan 20
3
get_docid???
my $mset = $enq->get_mset($nstart,$nrecords);
for(my $mit=$mset->begin(); $mit != $mset->end();$mit++) {
my $doc = $mit->get_document();
my $dat = $doc->get_data();
my $id = $doc->get_docid();
}
[Fri Jan 20 10:35:06 2012] newmail.cgi: Can't locate
auto/Search/Xapian/Document/get_docid.al in @INC (@INC contains:
/etc/perl
2013 Jun 19
2
Compact databases and removing stale records at the same time
On Wed, Jun 19, 2013, at 03:49 PM, Olly Betts wrote:
> On Wed, Jun 19, 2013 at 01:29:16PM +1000, Bron Gondwana wrote:
> > The advantage of compact - it runs approximately 8 times as fast (we
> > are CPU limited in each case - writing to tmpfs first, then rsyncing
> > to the destination) and it takes approximately 75% of the space of a
> > fresh database with maximum
2011 Apr 21
1
How to Retrieve content of the document?
Hi,
I have just started using xapian and I may sound like a noob. I want to know
how i can access the content of the document retrieved while searching. I
have used the code found on this mailing list itself to index my database.
#!/usr/bin/perl -w
use strict;
use Search::Xapian;
use File::Find;
my $DATABASE_DIR = '/home/rohit/Desktop/SET/DB';
my $db =
2013 Oct 23
2
performance on document.get_data()
I got some performance issue for document.get_data() and
enquire.get_mset(). It costs 35 seconds for matches =
enquire.get_mset(0,200), and 3 seconds for iterating all doc in matches to
get_data. Is't normal? My index contains 30millions documents. I use python
binding to operate xapian. Bellow it's my index structure
# value: 0:date, 1:site
# data: json message which contains: author,
2007 Nov 08
1
Perl make test fails on threads in rhel5
Hi all,
I've tried building RPMs for RHEL5 and hit this problem in Search::Xapian:
make test fails on test 37:
ok 34 - check PositionIterator
ok 35 - create TermIterator
ok 36 - check TermIterator
dubious
Test returned status 0 (wstat 11, 0xb)
DIED. FAILED tests 37-65
Failed 29/65 tests, 55.38% okay
$ xapian-config --version
xapian-config - xapian-core 1.0.4
$ cat
2008 Sep 27
3
Query::MatchAll
Why there still been rank when using Query::MatchAll() ?
2013 Aug 21
2
Perl interface isn't working in 1.2.x
At least it isn't working the way it used to.
Code:
$db = Search::Xapian::Database->new( $dx );
my $qp = Search::Xapian::QueryParser->new();
my $dbSize=$db->get_doccount();
# $qp->set_stemmer(new Search::Xapian::Stem("english"));
# $qp->set_stemming_strategy(STEM_SOME);
# $qp->set_default_op($defaultop);
my $par =
2007 Feb 09
1
PHP Binding and dbi2omega questions
Hi All,
I've installed Xapian and the php module. I've set up a script for use with
scriptindex and dbi2omega for getting data from the db into the index
easily, the script file is as follows:
===============================
id : field=id
title : index
title: field=title
description : index
description : truncate=50 field=content
=============================
However, when querying
2018 Nov 30
1
Xapian Benchmark results
Hi,
I am currently trying to benchmark a multithreaded xapian implementation on
a chameleon baremetal instance written in C++. My workload is a 3 Gig
wikipedia xml dump consisting of ~286 file of different sizes. My results
are showing me that indexing on xapian is an order of magnitude faster than
my lucene and lucene plusplus implementations. This is a result that I did
not expect. Just want to
2006 Nov 30
1
PHP / XapianQueryParser
Hi everyone,
I tried sending a message as a reply a while back on my previous topic, but it didn't go through. (Tried Gmane), not even when I 'authorized' the reply. So I'll just paste it here for reference, below this message. It might help some people. But now I have one other small problem, and I'm not sure if it is actually my mistake (although I'm pretty sure it is
2013 Sep 22
2
How to filter search result with query with has white space.
Hello,
include <iostream>#include <string>#include <xapian.h>struct document{
std::string title;
std::string content;
std::string url;};
void indexData(document d) {
try {
Xapian::WritableDatabase db("/Users/ramesh/Desktop/xapian",
Xapian::DB_CREATE_OR_OPEN);
Xapian::TermGenerator indexer;
Xapian::Stem
2013 Sep 22
2
How to filter search result with query with has white space.
Hello,
include <iostream>#include <string>#include <xapian.h>struct document{
std::string title;
std::string content;
std::string url;};
void indexData(document d) {
try {
Xapian::WritableDatabase db("/Users/ramesh/Desktop/xapian",
Xapian::DB_CREATE_OR_OPEN);
Xapian::TermGenerator indexer;
Xapian::Stem
2006 Dec 06
1
Bug and patch for +terms with wildcards
In current Xapian SVN HEAD, there is a bug in the query parser concerned
with the handling of wildcard terms with a "+" prefix. Specifically,
a query such as "+foo* bar" will be parsed by the query parser into
Xapian::Query("bar") if there are no terms in the database which start
"foo". Instead, since the "+" term cannot be matched, I believe
2023 Aug 18
1
does Xapian::Enquire hold an MVCC revision?
On Thu, Aug 17, 2023 at 09:28:26PM +0000, Eric Wong wrote:
> In other words, is it possible to avoid duplicates if new
> documents are inserted into the DB by another process in-between
> ->get_mset calls when reusing Xapian::Enquire objects?
The Database object itself effectively does (it works in a snapshot of
the state of the database when you open it, or last called reopen()
which
2012 Feb 17
2
DatabaseModifiedError on get_data - best practice?
Hi,
I have previously had a problem with getting this error on a get_mset
call, and solved it by subclassing XapianEnquire with a
backoff-and-retry algorithm (as suggested by this list, many thanks!).
However, I now get it intermittently when calling get_data on a
XapianDocument. The same solution doesn't seem to be quite as easy in
this case, because:
1. The document is not instantiated