Sander Pilon
2004-May-02 17:00 UTC
[Xapian-discuss] Perl binding: crash & missing functions?
Hi List, I've been playing with Xapian the last few days, and I got a few problems with Perl. First of all, when I add +/- 6000 documents (small ones, avg. less then 200 words) it crashes. (It justs quits with "Aborted".) When I do this is batches of 500, it doesn't. (add 500, quit process, add another 500, etc) Adding a flush() every few hundred documents or even closing and opening the database doesn't help. Help? Anyway, the problem above isn't such a big deal. I can do it in batches of 500 docs. However, I want to use a boolean, unweighted query sorted on date with the most recently added documents on the top. (Sorted on a value key) Because the indexer crashes (see problem above) sometimes, the same document can be present more then once in the database, so I'll want to use the "set_collapse_key" feature. The problem is that the Perl binding doesn't seem to support set_collapse_key and set_sort_key. I can call them (without errors) but they don't seem to do anything. Could it be that I'm doing something wrong somewhere, or are these functions really not supported? (And if so, are they going to be added?) My code looks like this: my @terms = split( ' ', lc( $query ) ); my $enq = Search::Xapian::Enquire->new( $xdb ); my $xq = Search::Xapian::Query->new( OP_AND, @terms ); $enq->set_sorting( 1, 1 ); # Sort by document id $enq->set_collapse_key( 1 ); # collapse on document id $enq->set_query( $xq ); Regards, Sander
Olly Betts
2004-May-04 13:31 UTC
[Xapian-discuss] Perl binding: crash & missing functions?
On Sun, May 02, 2004 at 07:00:37PM +0200, Sander Pilon wrote:> I've been playing with Xapian the last few days, and I got a few problems > with Perl. > > First of all, when I add +/- 6000 documents (small ones, avg. less then 200 > words) it crashes.This should work - I've added millions of documents in a single run from C++ and never had a crash.> (It justs quits with "Aborted".)There are a couple of abort()s in the code - in cases like "this should never happen" buffer overflows. You might be seeing an exception which the perl bindings aren't catching, though it's odd that the problem goes away with smaller batches. I think we need to see a full example indexing script (and any sample data) to be able to track this down.> The problem is that the Perl binding doesn't seem to support > set_collapse_key and set_sort_key. I can call them (without errors) but they > don't seem to do anything.These methods aren't currently wrapped. It's not hard to add though, and Alex is working on the Perl bindings this week so this should be fixed soon. I'm suprised you don't get an error - it's bad if someone can misspell a method name and not be told. Do you still get no warning or error with "perl -w" and "use strict"? Cheers, Olly
Sander Pilon
2004-May-04 19:58 UTC
[Xapian-discuss] Perl binding: crash & missing functions?
> -----Original Message----- > From: Alex Bowley [mailto:alex@ixion.tartarus.org] On Behalf > Of Alex Bowley > Sent: Tuesday, May 04, 2004 14:01 > To: Sander Pilon > Subject: Re: [Xapian-discuss] Perl binding: crash & missing functions? > > On Sun, May 02, 2004 at 07:00PM, Sander Pilon wrote: > > First of all, when I add +/- 6000 documents (small ones, avg. less > > then 200 > > words) it crashes. > > (It justs quits with "Aborted".) > > > > When I do this is batches of 500, it doesn't. (add 500, > quit process, > > add another 500, etc) Adding a flush() every few hundred > documents or > > even closing and opening the database doesn't help. Help? > > Hmmm. Which version of xapian are you using? 0.8.0? > Seach::Xapian is 0.0.5, I assume? >Correct.> Any chance you could mail me some sample code / input data? > (I'll understand if this is confidential)Neither the code or the data is confidential. It's just the data is, well, large. (Too much to mail.) I could give you access to the mysql database (this *WOULD* be confidential :), as it's on a fast server. But before I do, let me explain somewhat more. First,the machine I used to test on - a celeron 350 with 256Mb ram, linux 2.4.20 (debian). I can (repeatedly) make it crash after X documents. Meaning that I can reset the database, and if I repeat the steps that made it crash last time it will crash again. Now, my first thought would've been that something in a specific document makes it crash. It doesn't seem that way, though. Because if I do a run of 6000 documents, it crashes at document 5999. If I do 6 runs of 1000 documents, it crashes in run 6, document 999. (Same document.) If I run 12 runs of 500, it completes just fine. And now for the weird part. Just to make sure it wasn't my rather old hardware, I installed a brand new debian testing (sarge) installation in a vmware session on my rather new athlon 2600+ with 1G ram, etc. The VMWare session has 384Mb RAM. The first thing I noticed is that runs that make it crash on the celeron, don't make it crash in vmware. But before you go "ooh, his hardware is flakey!" ...... Other runs *DO* make it crash. o_O' Could it be unicode-related? (The documents I'm trying to index could contain unicode (UTF-8)) Are there certain terms Xapian doesn't like? (Still, no excuse for "Aborted" ... )> ...... (snip) > > I'm just about to start hacking on a new version of > Search::Xapian. I'll make sure these methods get wrapped > correctly. I'll let you know when it's been uploaded. >Thanks. Below is my rather primitive (don't laugh, it's my first one and I haven't written perl in well over two years) indexer that makes it go boom... http://www.shacknews.com/sander/indexer.txt It's not much more complicated then a split on whitespace on the articles, then remove the stopwords, strip punctiation and add terms with increasing termpos, then add the document to xapian, repeat.