I've got a small test database setup with one record. $ delve -r 1 -V /tmp/1/ Values for record #1: 0:DD4F2162FFFF0E43741A4A1C2B8EC0E7 1:./Text_page_scan_2.jpg 2:jpg 3:.jpg Term List for record #1: E:.jpg P:./Text_page_scan_2.jpg Q:DD4F2162FFFF0E43741A4A1C2B8EC0E7 T:jpg The terms were added with lines like this: doc.add_term(string("P:") + path); Problem is, I can't seem to run a query that returns the document using any of the terms. Here is the outline of the code that runs the queries I'm trying to run: Database db(db_path.string()); QueryParser queryparser; Stem stemmer("english"); //queryparser.set_stemmer(stemmer); queryparser.set_database(db); queryparser.add_prefix("type", "T"); queryparser.add_prefix("md5sum", "Q"); queryparser.add_prefix("path", "P"); queryparser.add_prefix("extension", "E"); //maybe set stemming strategy here (in query parser)? queryparser.set_stemming_strategy(QueryParser::STEM_NONE); Query query(queryparser.parse_query(full_string)); cout<<"Query is '"<<full_string<<"'"<<endl; Enquire enquire(db); enquire.set_query(query); MSet match_set(enquire.get_mset(0, 10)); for_each(match_set.begin(), match_set.end(), [&db](docid id) { print_doc_info(db.get_document(id)); }); I expected the following query to work, md5sum:DD4F2162FFFF0E43741A4A1C2B8EC0E7 but it returns nothing. Same for all the other terms and prefixes. Terms without prefixes seem to be working normally. I set stemming to NONE on everything. All I want is a way to ask xapian to return a list of all documents with specific paths and/or md5sums. thanks for any tips, Chris
On Sun, Sep 01, 2013 at 10:37:59PM -0400, Christopher Harvey wrote:> I've got a small test database setup with one record. > $ delve -r 1 -V /tmp/1/ > Values for record #1: 0:DD4F2162FFFF0E43741A4A1C2B8EC0E7 1:./Text_page_scan_2.jpg 2:jpg 3:.jpg > Term List for record #1: E:.jpg P:./Text_page_scan_2.jpg Q:DD4F2162FFFF0E43741A4A1C2B8EC0E7 T:jpg > > The terms were added with lines like this: > doc.add_term(string("P:") + path);Just add the prefix "P" here.> Problem is, I can't seem to run a query that returns the document using > any of the terms. Here is the outline of the code that runs the queries > I'm trying to run: > > Database db(db_path.string()); > QueryParser queryparser; > Stem stemmer("english"); > //queryparser.set_stemmer(stemmer); > queryparser.set_database(db); > queryparser.add_prefix("type", "T"); > queryparser.add_prefix("md5sum", "Q"); > queryparser.add_prefix("path", "P");Or if you really want that colon in there, add the prefix as "P:" here.> queryparser.add_prefix("extension", "E"); > //maybe set stemming strategy here (in query parser)? > queryparser.set_stemming_strategy(QueryParser::STEM_NONE); > Query query(queryparser.parse_query(full_string)); > cout<<"Query is '"<<full_string<<"'"<<endl;If you print out query.get_description() it should be clearer what's going on. Cheers, Olly
Olly Betts <olly at survex.com> writes:> On Sun, Sep 01, 2013 at 10:37:59PM -0400, Christopher Harvey wrote: >> I've got a small test database setup with one record. >> $ delve -r 1 -V /tmp/1/ >> Values for record #1: 0:DD4F2162FFFF0E43741A4A1C2B8EC0E7 1:./Text_page_scan_2.jpg 2:jpg 3:.jpg >> Term List for record #1: E:.jpg P:./Text_page_scan_2.jpg Q:DD4F2162FFFF0E43741A4A1C2B8EC0E7 T:jpg >> >> The terms were added with lines like this: >> doc.add_term(string("P:") + path); > > Just add the prefix "P" here. > >> Problem is, I can't seem to run a query that returns the document using >> any of the terms. Here is the outline of the code that runs the queries >> I'm trying to run: >> >> Database db(db_path.string()); >> QueryParser queryparser; >> Stem stemmer("english"); >> //queryparser.set_stemmer(stemmer); >> queryparser.set_database(db); >> queryparser.add_prefix("type", "T"); >> queryparser.add_prefix("md5sum", "Q"); >> queryparser.add_prefix("path", "P"); > > Or if you really want that colon in there, add the prefix as "P:" here.works! Thanks! well, more precisely it works for everything except path names. I was reading the docs here: http://xapian.org/docs/queryparser.html and saw the paragraph on punctuation: --- A phrase surrounded with double quotes ("") matches documents containing that exact phrase. Hyphenated words are also treated as phrases, as are cases such as filenames and email addresses (e.g. /etc/passwd or president at whitehouse.gov). --- using almost exactly the same code as last time, I fed it the following query: path:"foo.bar" printing query.get_description() produces the following: Query is 'Xapian::Query((P:foo:(pos=1) PHRASE 2 P:bar:(pos=2)))' I was expecting a query with one term, since the "foo.bar" was quoted. Consequently I can't match terms stored in the database that have "." characters. thanks again! Chris