Displaying 20 results from an estimated 400 matches similar to: "Given a document, how do you get its ID? (perl bindings)"
2010 Jan 16
1
PHP XapianTermIterator/XapianPositionIterator usage
Hello again,
/thanks to Peter for previous response.
I've been digging around trying to find sample usage of
XapianTermIterator/XapianPositionIterator in PHP. The idea is to code up a
test case in PHP to perform snippet extraction (with a possible view to
coding a pecl extension in C). I found a C++ sample, but that wasn't much
help.
I must be dense this morning though, since I
2012 Jan 20
3
get_docid???
my $mset = $enq->get_mset($nstart,$nrecords);
for(my $mit=$mset->begin(); $mit != $mset->end();$mit++) {
my $doc = $mit->get_document();
my $dat = $doc->get_data();
my $id = $doc->get_docid();
}
[Fri Jan 20 10:35:06 2012] newmail.cgi: Can't locate
auto/Search/Xapian/Document/get_docid.al in @INC (@INC contains:
/etc/perl
2017 Jun 06
1
Test for the end of PostingIterator in perl?
Hi all. I want to iterate over all the documents in my database.
my $pi = $db->postlist_begin("");
while ("$pi" =~ qr/END/) {
my $oldid = $pi->get_docid;
$pi++;
#...
}
That used to work with Search::Xapian in perl version 1.2, but now with
xapian-bindings-1.4.4 it does not seem to. How are you supposed to tell
when you have reached the
2013 Jun 19
2
Compact databases and removing stale records at the same time
On Wed, Jun 19, 2013, at 03:49 PM, Olly Betts wrote:
> On Wed, Jun 19, 2013 at 01:29:16PM +1000, Bron Gondwana wrote:
> > The advantage of compact - it runs approximately 8 times as fast (we
> > are CPU limited in each case - writing to tmpfs first, then rsyncing
> > to the destination) and it takes approximately 75% of the space of a
> > fresh database with maximum
2010 Oct 21
2
In-memory databases vs PHP Bindings
I can't quite connect the dots on this, perhaps someone can help. I'm
simply trying to create an in-memory database comprising a single document,
so that I can run a load of queries against it and see if any of them match
the new document (this is to enable users to have 'subscriptions' to saved
searches and be alerted every time a new item is published that matches
their
2014 May 02
3
[LLVMdev] Question about implementing exceptions, especially to the VMKit team
Hi Kevin,
To elaborate on Philip's point, depending on the state Pyston's
runtime already is in, you may have the choice of using a hybrid of a
"pending exception" word in your runtime thread structure, and an
implicit alternate ("exceptional") return address for calls into
functions that may throw. This lets you elide the check on the
pending exception word after
2016 May 03
2
Weighting recent results
On 5/2/2016 9:03 PM, Olly Betts wrote:
> On Fri, Apr 22, 2016 at 12:23:15PM -0400, Alex Aminoff wrote:
>> I did some digging and found a thread from 2011 talking about how to
>> subclass Xapian::PostingSource in order to incorporate the date or
>> recency of a document in its weighting:
>>
>> http://thread.gmane.org/gmane.comp.search.xapian.general/8849/focus=8856
2010 Apr 16
2
best practices - combining sql database and xapian, size of database?
Newbie-alert: I'm just getting started on a new project involving a
full text search requirement, and my initial investigation points to
xapian being the way to go.
Two questions:
- eventually I'll most likely be indexing towards 50 million
documents - is this reasonable to expect or attempt with xapian?
- each of my documents come with a set of attributes. These are easily
stored
2016 May 16
2
Weighting recent results
I was thinking about this some more: Is there a reason I can't just
weight by some function of recency at indexing time?
$weight = get_weight_based_on_recency(...);
$tg->index_text($txt,$weight);
If I wanted to allow the user the option of searching either in
recency-weighted mode or not, I could index each document into 2
different databases, one with and one without.
This avoids
2020 Apr 07
2
crash after running notmuch new
Matt <mattator at gmail.com> writes:
> thanks didn't know about xapian-check !
> the output
> ===
> docdata:
> blocksize=8K items=70 firstunused=3 revision=421 levels=0 root=2
> B-tree checked okay
> docdata table structure checked OK
>
> termlist:
> blocksize=8K items=186136 firstunused=62058 revision=421 levels=2 root=12260
> B-tree checked okay
>
2016 Apr 12
2
Xapian 1.3.5 snapshot performance and index size
Olly Betts writes:
> On Mon, Apr 11, 2016 at 09:54:36AM +0200, Jean-Francois Dockes wrote:
> > The question which remains for me is if I should run xapian-compact
> > after an initial indexing operation. I guess that this depends on the
> > amount of expected updates and that there is no easy answer ?
>
> I think it's not obvious whether it's a good plan
2009 Apr 12
2
Indexing speed benchmark - Xapian, Solr
I came across this benchmark between Xapian & Solr:
http://www.anur.ag/blog/2009/03/xapian-and-solr/
According to the benchmark, a doc set that took Solr 34 min to index took Xapian 7 hours. Solr's index is also much smaller - 2.5GB to Xapian's 8.9GB.
I'm new to Xapian. Just wondering if results like these are typical? Is indexing speed & size a known issue in Xapian? Or is
2018 Jul 12
1
Error while compacting: Bad position key
Mike Hommey <mh at glandium.org> writes:
> Hi,
>
> When running `notmuch compact` today, it stopped with the following
> output:
>
> Compacting database...
> compacting table postlist
> Reduced by 25% 648656K (2498904K -> 1850248K)
> compacting table docdata
> Reduced by 15% 24K (152K -> 128K)
> compacting table termlist
> Reduced by
2018 Mar 19
2
bug: "no top level messages" crash on Zen email loops
Antoine Beaupré <anarcat at orangeseeds.org> writes:
> On 2018-03-19 13:36:49, David Bremner wrote:
>>
>> I can't duplicate that part.
>
> That's very strange. I can reproduce this on my workstation here, but
> taking the tarball I sent in the original message, I can't reproduce
> anymore. So something changed! I suspect it's the
2010 Jan 30
2
Failure trying to update document.
Hi list.
I have a specific document that does not handle updates sitting in the
index. What can I do about that?
2010-01-30T13:58:07 Eval failure: Exception: No termlist for
document 287376 at /usr/lib/perl5/Search/Xapian/Enquire.pm line 56.
2010-01-30T13:58:07 job failed. considering retry. is max_retries
of 1000 >= failures of 1?
2010-01-30T13:58:07 job failed: Exception: No
2016 Apr 22
2
Weighting recent results
I did some digging and found a thread from 2011 talking about how to
subclass Xapian::PostingSource in order to incorporate the date or
recency of a document in its weighting:
http://thread.gmane.org/gmane.comp.search.xapian.general/8849/focus=8856
As in that thread, I want to be clear that I don't want to sort by date,
but rather incorporate date information into the score by which I
2017 Sep 12
2
perl bindings to Xapian::Query
QueryParser is great, but I would like to make a query myself, so I can
filter results by a specified value (in this case restricting by epoch
time after a certain value)
My code looks like this, and compiles, and appears like it should work
according to the perl source:
my $query = $qp->parse_query($querystr);
if ($datefilter) {
my $filterepoch = time() - ($datefilter
2010 Dec 18
1
Xapian index size 475GB = 170 million documents (URLs)
Xapians,
I am maintaining about two indexes for my search engines which
approximately is each the same size. I would like to share this
knowledge with you, since many of you have never seen Xapian index of
this size. And of course you can search the index by yourself at
- http://myhealthcare.com/
- http://find1friend.com/
I need 2 x 100 million more documents into each index, and I hope it
will
2018 Mar 29
2
bug: "no top level messages" crash on Zen email loops
On 2018-03-29 04:17:21, Olly Betts wrote:
> On Mon, Mar 19, 2018 at 05:03:21PM -0300, David Bremner wrote:
>> I can confirm this reproduces both the xapian-check and the notmuch-show
>> error. Olly agrees that whatever notmuch is doing wrong, it shouldn't
>> lead to a corrupted database
>
> There was a Xapian bug here, which I fixed on master last week and will
>
2010 Nov 16
2
Debugging segfault in foreach
Hi,
I'm using R-2.12 on a linux 64bit machine.
When I run a chunk of code inside a foreach() %do% { ...} or %dopar%
{...} (with doMC backend) I keep getting a segfault. Running the
*same* code within lapply(something, function(x) ... ) doesn't result
in any segfaults. I'll paste the output below, but I'm not sure it
would be helpful.
I'm more curious how to go about smoking