I'm still quite new to xapian and omega. Olly was a great help in putting together some docs on boolean search and I seem to still be having a bit of trouble. Talking about it is the best way to (perhaps by myself) solve my problem. I am indexing a music collection and want to be able to perform searches in certain fields. I'm a little confused on the difference between "boolean filters" and "Probabilistic Fields" .. I am using boolean filters but perhaps use the other? I would like to use the BCGI parameter to search in omega. So, I came up with a index file like this: -- begin music.index -- id: index boolean=Q unique=Q artist: lower field=artist boolean=XART index path: field=url info: lower index -- end music.index -- I can search and retrieve these documents as correctly but when I pass a B=XART for example, I don't seem to be getting the normal results back. Here is a sample record I feed into scriptindex. -- sample record feed into scriptindex -- id=39c5236d30e6c24cb737f3d166a7e05a artist=A Perfect Circle album=Mer de Noms title=Sleeping Beauty path=/storage/mp3/Full Albums/A Perfect Circle/Mer_De_Noms/A Perfect Circle - Sleeping Beauty.mp3 info=album info dumped here -- end sample record feed into scriptindex -- Thanks a bunch, Sig Lange
On Sun, Mar 06, 2005 at 09:12:36PM -0500, Sig Lange wrote:> I'm still quite new to xapian and omega. Olly was a great help in > putting together some docs on boolean search and I seem to still be > having a bit of trouble. Talking about it is the best way to (perhaps > by myself) solve my problem. > > I am indexing a music collection and want to be able to perform > searches in certain fields. I'm a little confused on the difference > between "boolean filters" and "Probabilistic Fields" .. I am using > boolean filters but perhaps use the other? I would like to use the B> CGI parameter to search in omega.Scriptindex isn't really my thing, but I'll try to deal with that as well as what you're trying to do. I'm also not convinced that Omega can manage perfectly this unaltered; I may well be wrong, and it's possible that this approach (while it makes sense in terms of Xapian) isn't the right solution when using Omega. To summarise: you want to, for instance (given the sample data you gave) search for "Circle" as "artist" and get back the "Sleeping Beauty" document. This is /not/ a boolean search; you'd use boolean search if you, say, wanted to search for "Sleeping" in document titles, but wanted to restrict to a particular genre (say "pop"). What I think you'll need to do is to index terms in the 'artist' field with a prefix, and then do a probabilistic search.> So, I came up with a index file like this: > -- begin music.index -- > id: index boolean=Q unique=Q > artist: lower field=artist boolean=XART index > path: field=url > info: lower index > -- end music.index --I'm pretty sure you don't want or need boolean=Q and unique=Q on the same line. unique=Q should be enough. For artist, you want something like: ---------------------------------------------------------------------- artist: lower field=artist index=<PREFIX> ---------------------------------------------------------------------- You might want to do it against with a non-prefixed index, as well (so it can be matched in a 'general' search). Then you want to do a probabilistic search with a term constructed from the artist, which will start with <PREFIX>. There's going to be a wrinkle to do with stemming, which is why I haven't specified <PREFIX> - someone else is going to have to chime in here (or come up with a better method entirely). Omega will do nasty things to your P input, trying to turn it into useful terms to search over. You actually want this (stemming, for instance), but you may need to choose the prefix very carefully to avoid weird effects in stemming. For instance, if your prefix was 'da' and one of the words in the artist was 'the' then gluing them together you get 'dathe', which stems to 'dath'. ('the' won't normally be stemmed). It's possible there's an easy solution to this by choosing the prefix correctly; unfortunately I don't understand the query parser used by Omega well enough to be able to advise here. It may be that there's a different approach that is better, but hopefully at least this describes boolean searching better for you. J -- /--------------------------------------------------------------------------\ James Aylett xapian.org james@tartarus.org uncertaintydivision.org
On Sun, Mar 06, 2005 at 09:12:36PM -0500, Sig Lange wrote:> I am indexing a music collection and want to be able to perform > searches in certain fields. I'm a little confused on the difference > between "boolean filters" and "Probabilistic Fields" .. I am using > boolean filters but perhaps use the other? I would like to use the B> CGI parameter to search in omega.To put it simply: If you want to perform a free-text search on a field (which will generally contain multiple words of text), you want "probabilistic". If the field contains a single category or code or similar, and you want to be able to filter search results according to the value of this field, you want to index it as "boolean".> So, I came up with a index file like this: > -- begin music.index -- > id: index boolean=Q unique=QYou don't want index here - that will index the id as a probabilistic term. So unless you want someone to be able to type "39c5236d30e6c24cb737f3d166a7e05a" into your search box, the "index" isn't useful. So you want: id: boolean=Q unique=Q> artist: lower field=artist boolean=XART indexYou probably want to preserve the case of the artist in the field, so put lower *after* field=artist. I'm not sure if you really want to lower for probabilistic indexing anyway... And I suspect artist should be a probablistic field. In which case you want: artist: field=artist index=XART If you really want a boolean filter, then probably: artist: field=artist lower boolean=XART> path: field=urlOK.> info: lower indexProbably just: info: index And I'd imagine you'd also want to index album and title (which are in the sample record you gave).> I can search and retrieve these documents as correctly but when I pass > a B=XART for example, I don't seem to be getting the normal results > back.If XART is boolean, you need to specify a whole term, not just a prefix - for example: 'B=XARTa perfect circle'. If you're hoping to generate a drop-down list of artists, you'll need to tweak Omega. There's a commented out function in query.cc called 'do_picker' which is probably close to what's needed. We wanted to make it possible to generate these using omegascript, but that's not been done yet. Cheers, Olly