When I use quest, if I search for "don't have", it appears as if Xapian is parsing for "don" and "t" -- is there a way to modify or change this behavior?
On Thu, Dec 01, 2005 at 05:44:11PM -0600, Tony Lambiris wrote:> When I use quest, if I search for "don't have", it appears as if Xapian > is parsing for "don" and "t" -- is there a way to modify or change this > behavior?The Xapian::QueryParser class will turn "don't" into a phrase search for "don" followed by "t". Currently that's not configurable, short of modifying the source code, which isn't hard - line 348 (or thereabouts) of queryparser/queryparser_internal.cc is: if (*it != '&') break; If you change that to also check for a single quote, it'll not split on a single embedded single quote (which is exactly what you want): if (*it != '&' && *it != '\'') break; For English at least, it would make sense to always treat an embedded single quote as a word character, except for the possessive form (e.g. "Tony's") where you really want to be able to match on Tony too. Perhaps we should just special case that at index time. I've done that in the past for a particular project, but it doesn't really seem the right approach for a general purpose piece of code. Cheers, Olly
On Mon, Dec 12, 2005 at 11:40:56AM -0600, Tony Lambiris wrote:> Sorry, was it this instead: > if (*it != '&' || *it != '\'') break; > > Im not a super C person, but it seems like you would need an OR > statement if the if function is checking the same pointer?No, it should be &&. With || the whole expression will always be true, because *it can't be both '&' and '\' at once so at least one side of the || will always be true. Perhaps it's clearer if you pull out the not: (*it != '&' && *it != '\'') -> !(*it == '&' || *it == '\'')> Sorry, one more question -- does this change work right away, or will I > have to reindex my data source?I had assumed you'd written your own indexer which was treating an embedded apostrophe as a word character (though rereading your original message it doesn't say if you have or not). But if you have, then you don't need to reindex. If you're using omindex or scriptindex you'll need to make the corresponding change to indextext.cc (again, just follow how '&' is handled), rebuild, and reindex. Cheers, Olly