Jesper Krogh
2008-May-15 11:50 UTC
[Xapian-discuss] STEM_SOME and prefixes.. (even boolean)
Hi. This seems somehow a bit strange. And I cant really see if it is a bug or a "feature" but: I have acc listed as a boolean prefix. I use STEM_SOME since that seem to be the most useful way of doing stuff. But it would be really nice if we'd either stemmed all prefixes or we didn't. I have some terms like Q1W2E3 that is listed as boolean prefixes. This is essential ID's.. so I really dont want the stemming algorithm to accidentally stumble over them. But then if the id happens to be start with an upper-case letter it gets fed to the search like this: Search: acc:Q1W2E3 Running query 'Xapian::Query(0 * ACC:Q1W2E3)' As far as I can tell the query with a : will never match anything in the index? Xapian 1.0.5 Jesper -- Jesper Krogh
Matthew Somerville
2008-May-15 16:35 UTC
[Xapian-discuss] STEM_SOME and prefixes.. (even boolean)
Jesper Krogh wrote:> I have acc listed as a boolean prefix.Do you mean you have something like: $queryparser->add_boolean_prefix('acc', 'Q'); or something else?> I have some terms like Q1W2E3 that is listed as boolean prefixes.Do you mean you have a document in your database that has Q1W2E3 as a term? I'm guessing not because of what you say below, so what is the term you have entered in the database for the ID "Q1W2E3"? > This is essential ID's.. so I really dont want the stemming algorithm to> accidentally stumble over them. But then if the id happens to be start with an > upper-case letter it gets fed to the search like this: > Search: > acc:Q1W2E3 > > Running query 'Xapian::Query(0 * ACC:Q1W2E3)'This doesn't sound like a stemming issue (though I could be wrong :) ). If I have "acc" as a boolean prefix here with the above queryparser line, a query for acc:Q1W2E3 to QueryParser becomes: Xapian::Query(0 * QQ1W2E3) and if I don't have "acc" as a boolean prefix, it becomes: Xapian::Query((acc:(pos=1) PHRASE 2 q1w2e3:(pos=2))) ie. it's treated as a phrase search. Do you have some short example code that exhibits the issue? ATB, Matthew
On Thu, May 15, 2008 at 01:50:16PM +0200, Jesper Krogh wrote:> Search: > acc:Q1W2E3 > > Running query 'Xapian::Query(0 * ACC:Q1W2E3)' > > As far as I can tell the query with a : will never match anything in the > index?The issue here is that given the term ACCQ1W2E3, how do you say what the prefix is? You're wanting it to be ACC, but it could be ACCQ, AC, or just A. So when adding a multi-character term prefix, we insert a ':' if the term starts with a capital so that the prefix/term boundary isn't lost. Obviously this needs to happen at index time too, or as you say the term with the colon will never match. There's also an assumption in some places that you follow the convention that multicharacter prefixes only start with 'X' (I think only in Omega but I'm not certain). Cheers, Olly