djcb
2008-Nov-10 19:20 UTC
[Xapian-discuss] writing match deciders / custom handling of terms
Hi all, I'm using Xapian for my mail indexer/searcher[1]; the current version uses Xapian in tandem with SQLite, but I'm making it Xapian-only now, mainly for reasons of simplification of the code. Anyhow, I'd have to compliment the Xapian developers for producing such a nice piece of work! It works really well, and really fast.[2] Now, my question is about the MatchDeciders (I think). Suppose I have a query to find some messages in my Xapian DB, e.g: subject:foo AND flags:A which would match message with subject 'foo' and messages with flag 'A' (having attachments). In the database, flags are just a number. So, I need some custom handling of this 'flags:A' term, and match the appropriate documents. Now, it seems(?) that MatchDeciders are the way to go -- but I don't see a way to do the custom handling of the flags parameter -- am I missing something simple? Thanks in advance! Dirk. Footnotes: [1] http://www.djcbsoftware.nl/code/mu [2] But: there are some things that seem a bit strange though; e.g. there seems to be no API to add the prefix to add_term, requiring me to manually prefix the strings, which seems a bit hackish... and the Xapian::Sorter which returns a string, which is then sorted; I was expecting something similar to std::less, or GCompareFunc in GLib; not being able to do the comparison myself forces me to pad numeric values with 0 etc., so the sorting works -- ----------------------------------------------- Dirk-Jan C. Binnema <djcb at djcbsoftware.nl> blog: http://www.djcbsoftware.nl/ChangeLog (NL) http://djcbflux.blogspot.com (EN) chat: djcb at jabber.org -----------------------------------------------
Oliver Flimm
2008-Nov-11 07:11 UTC
[Xapian-discuss] writing match deciders / custom handling of terms
Hi, On Mon, Nov 10, 2008 at 09:20:26PM +0200, djcb wrote:> Now, it seems(?) that MatchDeciders are the way to go -- but I don't see > a way to do the custom handling of the flags parameter -- am I missing > something simple?when using MatchDeciders I usually do it the following way to implement facets in Perl: 1) For each facet I use a value to store the appropriate content of a category (of our library catalogue). In case a document has multiple content for a given facet (e.g. several subject headings in our library catalogue) all fields get concatenated with \t. Another point to be aware of: I couldn't get the query parser to search for 'terms with spaces', although it's no problem to index them. Thats why I replace them with underscores (and normalize special characters, transform to lower case etc.) so 'Web 2.0' becomes 'web_2.0'. When actually searching for a facet value you'll have to use the same transformations. 2) To implement a match decider I use a hash (decider_map) for the resulting facets and a code-ref $decider_ref where I place the code for the decider. This code-ref is then used on the matches-Method of the Enquery-Object: my @matches = $enq->matches(0,$maxmatch,$decider_ref); I've written all this in my project wiki, *but* in german ;-) http://wiki.openbib.org/index.php?title=Einf?hrung_in_das_Xapian_Perl-API Regards, Oliver -- Universitaet zu Koeln :: Universitaets- und Stadtbibliothek IT-Dienste :: Abteilung Universitaetsgesamtkatalog Universitaetsstr. 33 :: D-50931 Koeln Tel.: +49 221 470-3330 :: Fax: +49 221 470-5166 flimm at ub.uni-koeln.de :: www.ub.uni-koeln.de
Olly Betts
2008-Nov-11 12:38 UTC
[Xapian-discuss] writing match deciders / custom handling of terms
2008/11/10 djcb <djcb.bulk at gmail.com>:> Now, my question is about the MatchDeciders (I think). Suppose I have a > query to find some messages in my Xapian DB, e.g: > > subject:foo AND flags:A > > which would match message with subject 'foo' and messages with flag 'A' > (having attachments). In the database, flags are just a number. So, I > need some custom handling of this 'flags:A' term, and match the > appropriate documents. > > Now, it seems(?) that MatchDeciders are the way to go -- but I don't see > a way to do the custom handling of the flags parameter -- am I missing > something simple?The QueryParser doesn't (at least currently) allow you to generate a MatchDecider - you need to add it separately. In this case I'd probably just generate a term for each flag at index time and use QueryParser::set_boolean_prefix().> [2] But: there are some things that seem a bit strange though; e.g. there seems > to be no API to add the prefix to add_term, requiring me to manually > prefix the strings, which seems a bit hackish...Well, TermGenerator can do prefixing for you. But it's mostly just string concatenation anyway.> and the Xapian::Sorter > which returns a string, which is then sorted; I was expecting something > similar to std::less, or GCompareFunc in GLibThe reason for generating the sort key rather than offering a comparator is mostly down to the number of callbacks required - for a comparator it's O(n.log(n)) while for generating a sort key it's O(n). Since n can easily be millions, this can make quite a difference.> not being able to do > the comparison myself forces me to pad numeric values with 0 etc., so > the sorting worksSee Xapian::sortable_serialise(). It's also much more compact than storing numbers as ASCII strings and can handle floating point numbers. Cheers, Olly