I had a hard time finding sample perl code so I here is mine. Instead of storing the real position it just stores the line number. One thing that puzzled me is I'm not sure about the error handling on set_data() and add_posting(). Wouldn't they normally return 0 on failure so that you could say set_data() or warn "set_data() failed"? I'm really impressed with xapian. I'm using the flint backend and it's really fast. regards, dan carpenter #========================#!/usr/bin/perl -w use strict; use Search::Xapian; use File::Find; my $DATABASE_DIR = '/home/dcarpenter/tmp/firm'; my $db = Search::Xapian::WritableDatabase->new($DATABASE_DIR, Search::Xapian::DB_CREATE_OR_OPEN) or die "can't create write-able db object: $!\n"; my $dir = shift; if (!$dir) { print "usage: index_data.pl <dir>\n"; exit(1); } my $file; my $doc; my $line; my @words; my $tmp; my $count = 0; find(\&index, $dir); sub index { # only index regular text files return unless -T $_; $file = $_; $doc = Search::Xapian::Document->new() or die "can't create doc object for $file: $!\n"; if ($doc->set_data("$File::Find::name")){ warn "can't set_data in doc object for $file: $!\n"; } $line = 1; open(FILE, $file); while (<FILE>){ s/^\W+//; s/\W+$//; @words = split(/\W+/, $_); foreach $tmp (@words){ if ($doc->add_posting($tmp, $line)){ warn "can't add word $tmp $line: $!\n"; } } $line++; } close(FILE); $db->add_document($doc) or warn "failed to add document: $file\n"; $count++; if ($count%500 == 0){ print "$count files indexed\n"; } } print "Total: $count files indexed\n";
On Sat, Jan 28, 2006 at 05:53:01PM -0800, Dan Carpenter wrote:> I had a hard time finding sample perl code so I here is mine. Instead > of storing the real position it just stores the line number.Hmm, there's some sample search code, but no sample indexing code it seems. I'll add a version of simpleindex to match C++ and the other bindings. Presumably you're happy for me to include your example too?> One thing that puzzled me is I'm not sure about the error handling on > set_data() and add_posting(). Wouldn't they normally return 0 on > failure so that you could say set_data() or warn "set_data() failed"?I think there's no way set_data can fail while add_posting can only fail if you try to give an empty termname, in which case an exception is thrown. Looking at the XS glue, that case doesn't seem to be handled currently. Cheers, Olly