Jeff Anderson
2007-Apr-09 20:32 UTC
[Xapian-discuss] Seek help with simple writer/search using Perl
Hello all. I just started using Xapian and am already having troubles storing and searching very simple data. Here is the relevant code i use to store the simple values: foo, bar, baz and qux: ------------------------------------------------------------------------------- my $db = Search::Xapian::WritableDatabase->new( '/tmp/xapian/test', Search::Xapian::DB_CREATE ) or die "can't create: $!\n"; print "Opened indexer for write with ", $db->get_doccount, " docs\n"; for my $data (qw(foo bar baz qux)) { my $doc = Search::Xapian::Document->new(); $doc->set_data( $data ); $doc->add_posting( 'ONE', 1 ); $doc->add_posting( 'TWO', 1 ); $doc->add_value( 1, $data ); $doc->add_value( 2, $data ); $db->add_document( $doc ); } $db->flush; print "There are now ", $db->get_doccount, " docs in index\n"; ------------------------------------------------------------------------------- As you can see, i am storing each value via set_data(), add_posting(), and add_value() (i really have no idea what the differences are or why i need to use one instead of the other). There seems to be no problems running this code, and after running this code it produces the following output for me: ------------------------------------------------------------------------------- Opened indexer for write with 0 docs There are now 4 docs in index ------------------------------------------------------------------------------- Moving on ... searching my database. Here is the relevant code i am using: ------------------------------------------------------------------------------- my $db = Search::Xapian::Database->new( '/tmp/xapian/test' ) or die $!; for my $term (qw(foo bar baz qux)) { print "found ", $db->get_termfreq( $term ), " docs for term $term"; } for my $id (1..4) { my $doc = $db->get_document( $id ); print "doc $id has data: ", $doc->get_data; } my $enq = $db->enquire( OP_OR, 'bar' ); warn "here's what we got: ", $enq->get_query; my @match = $enq->matches( 0, 10 ); print scalar( @match ) . " results found"; ------------------------------------------------------------------------------- In the first for loop i check to see if there are any docs with the same simple values (foo, bar, baz and qux) that i used when i created the database and populated it with documents. The second for loop assumes that there are 4 documents in the database 1-4 (as there should be after running my first code snippet) and simply fetches them and prints the value stored as 'data.' Finally, i search for a hard coded value 'bar'. Here is the result of me running this code (immediately after running my first snippet i should add) ------------------------------------------------------------------------------- found 0 docs for term foo found 0 docs for term bar found 0 docs for term baz found 0 docs for term qux doc 1 has data: foo doc 2 has data: bar doc 3 has data: baz doc 4 has data: qux here's what we got: Xapian::Query(bar) at bin/xapian_search.pl line 48. 0 results found ------------------------------------------------------------------------------- As you can see, there are no docs with any of the terms i thought i assigned, but there are 4 documents in the database and the data was stored. However, my search results always turn up empty. :( Can anyone here at this list offer any help, insight or general suggestions? I've been reading the site docs all day long and have some pretty good success, but this problem is proving to be a real show stopper for me. Thanks in advance, -- jeffa
Olly Betts
2007-Apr-09 20:48 UTC
[Xapian-discuss] Seek help with simple writer/search using Perl
On Mon, Apr 09, 2007 at 03:32:38PM -0400, Jeff Anderson wrote:> As you can see, i am storing each value via set_data(), add_posting(), > and add_value() (i really have no idea what the differences are or why > i need to use one instead of the other).This is the nub of your problem! The document data is opaque as far as Xapian is concerned. Store whatever you want there (typically you'd put things you need to display a match to the user, like the title, a sample of text, etc). A document "value" is a small piece of data which is stored such that it can be accessed rapidly during the match, for things like sorting by date, collapsing similar matches (same MD5 sum, same website, etc). Calling add_posting() adds an index entry for the current document with positional information (which allows phrase searching, etc). You can also use add_term() to add an index entry without positional information - this is commonly used for terms intended for boolean filtering (e.g. you might add a term for "document language"). You can also use add_term() for all index entries if you don't need to support phrase searching. So in your example, you'll only be able to search for terms "ONE" and "TWO". If you want "foo", "bar", etc to match, you need to add them as terms (using add_posting() or add_term()) instead of (or as well as) as values! Cheers, Olly
Ralf Mattes
2007-Apr-09 20:48 UTC
[Xapian-discuss] Seek help with simple writer/search using Perl
On Mon, 2007-04-09 at 15:32 -0400, Jeff Anderson wrote:> Hello all. I just started using Xapian and am already having troubles > storing and searching very simple data. > > Here is the relevant code i use to store the simple values: foo, bar, > baz and qux: > > ------------------------------------------------------------------------------- > my $db = Search::Xapian::WritableDatabase->new( > '/tmp/xapian/test', > Search::Xapian::DB_CREATE > ) or die "can't create: $!\n"; > > print "Opened indexer for write with ", $db->get_doccount, " docs\n"; > > for my $data (qw(foo bar baz qux)) { > > my $doc = Search::Xapian::Document->new(); > > $doc->set_data( $data );I think you want to add your terms as postings and not as data - at least iff you want to search for them later on ... $doc->add_posting( $data, 1); HTH Ralf Mattes> > $doc->add_posting( 'ONE', 1 ); > $doc->add_posting( 'TWO', 1 ); > > $doc->add_value( 1, $data ); > $doc->add_value( 2, $data ); > > $db->add_document( $doc ); > } > > $db->flush; > > print "There are now ", $db->get_doccount, " docs in index\n"; > ------------------------------------------------------------------------------- > > As you can see, i am storing each value via set_data(), add_posting(), > and add_value() (i really have no idea what the differences are or why > i need to use one instead of the other). There seems to be no problems > running this code, and after running this code it produces the > following output for me: > > ------------------------------------------------------------------------------- > Opened indexer for write with 0 docs > There are now 4 docs in index > ------------------------------------------------------------------------------- > > > > Moving on ... searching my database. Here is the relevant code i am using: > > ------------------------------------------------------------------------------- > my $db = Search::Xapian::Database->new( '/tmp/xapian/test' ) or die $!; > > for my $term (qw(foo bar baz qux)) { > > print "found ", $db->get_termfreq( $term ), " docs for term $term"; > } > > for my $id (1..4) { > > my $doc = $db->get_document( $id ); > > print "doc $id has data: ", $doc->get_data; > } > > my $enq = $db->enquire( OP_OR, 'bar' ); > warn "here's what we got: ", $enq->get_query; > > my @match = $enq->matches( 0, 10 ); > print scalar( @match ) . " results found"; > ------------------------------------------------------------------------------- > > In the first for loop i check to see if there are any docs with the > same simple values (foo, bar, baz and qux) that i used when i created > the database and populated it with documents. The second for loop > assumes that there are 4 documents in the database 1-4 (as there > should be after running my first code snippet) and simply fetches them > and prints the value stored as 'data.' > > Finally, i search for a hard coded value 'bar'. Here is the result of > me running this code (immediately after running my first snippet i > should add) > > ------------------------------------------------------------------------------- > found 0 docs for term foo > found 0 docs for term bar > found 0 docs for term baz > found 0 docs for term qux > doc 1 has data: foo > doc 2 has data: bar > doc 3 has data: baz > doc 4 has data: qux > here's what we got: Xapian::Query(bar) at bin/xapian_search.pl line 48. > > 0 results found > ------------------------------------------------------------------------------- > > As you can see, there are no docs with any of the terms i thought i > assigned, but there are 4 documents in the database and the data was > stored. However, my search results always turn up empty. :( > > Can anyone here at this list offer any help, insight or general > suggestions? I've been reading the site docs all day long and have > some pretty good success, but this problem is proving to be a real > show stopper for me. > > Thanks in advance,
Jeff Anderson
2007-Apr-11 16:23 UTC
[Xapian-discuss] Seek help with simple writer/search using Perl
On 4/11/07, xapian-discuss-request@lists.xapian.org <xapian-discuss-request@lists.xapian.org> wrote:> Message: 1 > From: James Aylett <james-xapian@tartarus.org> > The best we have at the moment is the quickstart guide. The main > problem is that it assumes you want to build things in C++, and so > spends quite a lot of time doing things that are unnecessary or easier > in other languages. Nonetheless I would strongly recommend reading it > to get an understanding of documents, postings, document data and > queries. (Values are less commonly used, and so I believe are not > covered.)> Message: 2> I'd generally recommend JSON in most cases, particularly for getting > started. It has great support across a very wide range of languages, > and can transport more conveniently than XML (particularly, NULs).Thanks James. I understand the need to keep things flexible, but not providing a mechanism to store key/value data introduces its own kind of inflexibility by adding a dependency to some 3rd party storage device. I find that a very strange requirement of a storage device -- to require further storage devices to store your stuff. But to each his own i suppose. (But is that Edgar F. Codd i hear rolling in his grave?). As for the documentation (which i did read before asking my first question) ... documentation is as documentation does ... Thanks for the help again, we decided to switch to Kinosearch so i apologize for wasting everyone's time here. -- jeffa