I've got a small test database setup with one record.
$ delve -r 1 -V /tmp/1/
Values for record #1: 0:DD4F2162FFFF0E43741A4A1C2B8EC0E7
1:./Text_page_scan_2.jpg 2:jpg 3:.jpg
Term List for record #1: E:.jpg P:./Text_page_scan_2.jpg
Q:DD4F2162FFFF0E43741A4A1C2B8EC0E7 T:jpg
The terms were added with lines like this:
doc.add_term(string("P:") + path);
Problem is, I can't seem to run a query that returns the document using
any of the terms. Here is the outline of the code that runs the queries
I'm trying to run:
Database db(db_path.string());
QueryParser queryparser;
Stem stemmer("english");
//queryparser.set_stemmer(stemmer);
queryparser.set_database(db);
queryparser.add_prefix("type", "T");
queryparser.add_prefix("md5sum", "Q");
queryparser.add_prefix("path", "P");
queryparser.add_prefix("extension", "E");
//maybe set stemming strategy here (in query parser)?
queryparser.set_stemming_strategy(QueryParser::STEM_NONE);
Query query(queryparser.parse_query(full_string));
cout<<"Query is
'"<<full_string<<"'"<<endl;
Enquire enquire(db);
enquire.set_query(query);
MSet match_set(enquire.get_mset(0, 10));
for_each(match_set.begin(), match_set.end(),
[&db](docid id) {
print_doc_info(db.get_document(id));
});
I expected the following query to work,
md5sum:DD4F2162FFFF0E43741A4A1C2B8EC0E7
but it returns nothing. Same for all the other terms and prefixes. Terms
without prefixes seem to be working normally. I set stemming to NONE on
everything.
All I want is a way to ask xapian to return a list of all documents with
specific paths and/or md5sums.
thanks for any tips,
Chris
On Sun, Sep 01, 2013 at 10:37:59PM -0400, Christopher Harvey wrote:> I've got a small test database setup with one record. > $ delve -r 1 -V /tmp/1/ > Values for record #1: 0:DD4F2162FFFF0E43741A4A1C2B8EC0E7 1:./Text_page_scan_2.jpg 2:jpg 3:.jpg > Term List for record #1: E:.jpg P:./Text_page_scan_2.jpg Q:DD4F2162FFFF0E43741A4A1C2B8EC0E7 T:jpg > > The terms were added with lines like this: > doc.add_term(string("P:") + path);Just add the prefix "P" here.> Problem is, I can't seem to run a query that returns the document using > any of the terms. Here is the outline of the code that runs the queries > I'm trying to run: > > Database db(db_path.string()); > QueryParser queryparser; > Stem stemmer("english"); > //queryparser.set_stemmer(stemmer); > queryparser.set_database(db); > queryparser.add_prefix("type", "T"); > queryparser.add_prefix("md5sum", "Q"); > queryparser.add_prefix("path", "P");Or if you really want that colon in there, add the prefix as "P:" here.> queryparser.add_prefix("extension", "E"); > //maybe set stemming strategy here (in query parser)? > queryparser.set_stemming_strategy(QueryParser::STEM_NONE); > Query query(queryparser.parse_query(full_string)); > cout<<"Query is '"<<full_string<<"'"<<endl;If you print out query.get_description() it should be clearer what's going on. Cheers, Olly
Olly Betts <olly at survex.com> writes:> On Sun, Sep 01, 2013 at 10:37:59PM -0400, Christopher Harvey wrote: >> I've got a small test database setup with one record. >> $ delve -r 1 -V /tmp/1/ >> Values for record #1: 0:DD4F2162FFFF0E43741A4A1C2B8EC0E7 1:./Text_page_scan_2.jpg 2:jpg 3:.jpg >> Term List for record #1: E:.jpg P:./Text_page_scan_2.jpg Q:DD4F2162FFFF0E43741A4A1C2B8EC0E7 T:jpg >> >> The terms were added with lines like this: >> doc.add_term(string("P:") + path); > > Just add the prefix "P" here. > >> Problem is, I can't seem to run a query that returns the document using >> any of the terms. Here is the outline of the code that runs the queries >> I'm trying to run: >> >> Database db(db_path.string()); >> QueryParser queryparser; >> Stem stemmer("english"); >> //queryparser.set_stemmer(stemmer); >> queryparser.set_database(db); >> queryparser.add_prefix("type", "T"); >> queryparser.add_prefix("md5sum", "Q"); >> queryparser.add_prefix("path", "P"); > > Or if you really want that colon in there, add the prefix as "P:" here.works! Thanks! well, more precisely it works for everything except path names. I was reading the docs here: http://xapian.org/docs/queryparser.html and saw the paragraph on punctuation: --- A phrase surrounded with double quotes ("") matches documents containing that exact phrase. Hyphenated words are also treated as phrases, as are cases such as filenames and email addresses (e.g. /etc/passwd or president at whitehouse.gov). --- using almost exactly the same code as last time, I fed it the following query: path:"foo.bar" printing query.get_description() produces the following: Query is 'Xapian::Query((P:foo:(pos=1) PHRASE 2 P:bar:(pos=2)))' I was expecting a query with one term, since the "foo.bar" was quoted. Consequently I can't match terms stored in the database that have "." characters. thanks again! Chris