Francis Irving
2008-Apr-25 10:51 UTC
[Xapian-discuss] acts_as_xapian, pre-release (Ruby on Rails)
Hi all, I've been using Ruby on Rails, and finally got fed up with Solr/Lucene. So I've made acts_as_xapian. An early version is available here: https://secure.mysociety.org/cvstrac/dir?d=mysociety/foi/vendor/plugins/acts_as_xapian It works, but isn't deployed on a live site yet (will be on our UK Freedom of Information request filing/archiving site www.whatdotheyknow.com soon) I've put the parts of the documentation which compare it to acts_as_solr at the bottom of this email. Any suggestions as to features it should have that would be easy to add? It's got sort, date range, collapse, spelling, offline indexing, and integration with Rail models. Anything else big/obvious that most people will need? Or anything easy to add and genius looking (like spelling was!)? If anyone can try it out, patches welcome!, then that would be super awesome. It's really not been used much yet as I only made it the day before yesterday, so buyer beware. Francis mySociety P.S. Is adding highlighting to QueryParser on the development plan? By that I mean a function which you give it some text and a number of words and a highlighting prefix/postfix, and it returns an extract of the text highlighted for the query. I really feel it is something that belongs in QueryParser, as it is fundamental to the format of queries to do it well (i.e. with quoting, and prefixes and operators, and even ranges), and nearly every search application needs it. # Comparison to acts_as_solr (as on 24 April 2008) # =========================# # * Offline indexing only mode - which is a minus if you want changes # immediately reflected in the search index, and a plus if you were going to # have to implement your own offline indexing anyway. # # * Collapsing - the equivalent of SQL's "group by". You can specify a field # to collapse on, and only the most relevant result from each value of that # field is returned. Along with a count of how many there are in total. # acts_as_solr doesn't have this. # # * No highlighting - Xapian can't return you text highlighted with a search query. # You can try and make do with TextHelper::highlight. I found the highlighting # in acts_as_solr didn't really understand the query anyway. # # * Date range searching - maybe this works in acts_as_solr, but I never found # out how. # # * Spelling correction - "did you mean?" built in and just works. # # * Multiple models - acts_as_xapian searches multiple models if you like, # returning them mixed up together by relevancy. This is like multi_solr_search, # only it is the default mode of operation and is properly supported. # # * No daemons - However, if you have more than one web server, you'll need to # work out how to use Xapian's remote backend http://xapian.org/docs/remote.html. # # * One layer - full-powered Xapian is called directly from the Ruby, without # Solr getting in the way whenever you want to use a new feature from Lucene. # # * No Java - an advantage if you're more used to working in the rest of the # open source world. acts_as_xapian, it's pure Ruby and C++. # # * Xapian's awesome email list - the kids over at xapian-discuss are super # helpful. Useful if you need to extend and improve acts_as_xapian. The # Ruby bindings are mature and well maintained as part of Xapian. # http://lists.xapian.org/mailman/listinfo/xapian-discuss #
Richard Boulton
2008-Apr-25 11:12 UTC
[Xapian-discuss] acts_as_xapian, pre-release (Ruby on Rails)
Francis Irving wrote:> Hi all, > > I've been using Ruby on Rails, and finally got fed up with Solr/Lucene. So I've > made acts_as_xapian. An early version is available here:Though I'm not a Ruby guy, this sounds really good, thanks! :)> P.S. Is adding highlighting to QueryParser on the development plan? By that I > mean a function which you give it some text and a number of words and a > highlighting prefix/postfix, and it returns an extract of the text highlighted > for the query. I really feel it is something that belongs in QueryParser, as > it is fundamental to the format of queries to do it well (i.e. with quoting, > and prefixes and operators, and even ranges), and nearly every search > application needs it.I was going to point you to the relevant ticket in trac, but there doesn't seem to be one! It's certainly something which has been discussed, and would be very nice to have, but we've just not got around to it yet. There is code to do highlighting in Omega, which could probably be pulled into xapian-core without too much effort. There is a ticket in trac about producing dynamic summaries / snippets based on the search (so that the displayed summary text for each document is relevant to the search text): http://trac.xapian.org/ticket/211 For what it's worth, I don't think highlighting exactly belongs in the QueryParser, but would be a useful addition to the "QueryParser/TermGenerator" family of objects. Regarding missing features, I can't think of any particular examples. Maybe access to Xapian's support for synonyms would be useful for some people. The 1.1.0 release of xapian (still a few months away, I fear) will include support for efficient database replication - ie, building the index on one machine, and replicating it across multiple search machines by sending just the parts which have changed with each commit(), which is definitely useful for high volume sites (it was developed to help with http://mydeco.com/). But I can't think of any similarly cool features you've missed which exist on HEAD.> # * Xapian's awesome email list - the kids over at xapian-discuss are super > # helpful. Useful if you need to extend and improve acts_as_xapian. The > # Ruby bindings are mature and well maintained as part of Xapian. > # http://lists.xapian.org/mailman/listinfo/xapian-discussThanks! -- Richard
Francis Irving
2008-May-06 10:25 UTC
[Xapian-discuss] acts_as_xapian, pre-release (Ruby on Rails)
I'm now using this on the live version of our Freedom of Information website. http://www.whatdotheyknow.com If you fancy using acts_as_xapian, it's quite a bit more mature now. Is anybody a Rails / Gem guru, or know one who fancies working out how to package it up properly for the Rails world to enjoy? Francis On Fri, Apr 25, 2008 at 11:51:00AM +0100, Francis Irving wrote:> Hi all, > > I've been using Ruby on Rails, and finally got fed up with Solr/Lucene. So I've > made acts_as_xapian. An early version is available here: > > https://secure.mysociety.org/cvstrac/dir?d=mysociety/foi/vendor/plugins/acts_as_xapian > > It works, but isn't deployed on a live site yet (will be on our UK Freedom of > Information request filing/archiving site www.whatdotheyknow.com soon) > > I've put the parts of the documentation which compare it to acts_as_solr at > the bottom of this email. > > Any suggestions as to features it should have that would be easy to add? It's > got sort, date range, collapse, spelling, offline indexing, and integration > with Rail models. Anything else big/obvious that most people will need? > Or anything easy to add and genius looking (like spelling was!)? > > If anyone can try it out, patches welcome!, then that would be super awesome. > It's really not been used much yet as I only made it the day before yesterday, > so buyer beware. > > Francis > mySociety > > P.S. Is adding highlighting to QueryParser on the development plan? By that I > mean a function which you give it some text and a number of words and a > highlighting prefix/postfix, and it returns an extract of the text highlighted > for the query. I really feel it is something that belongs in QueryParser, as > it is fundamental to the format of queries to do it well (i.e. with quoting, > and prefixes and operators, and even ranges), and nearly every search > application needs it. > > # Comparison to acts_as_solr (as on 24 April 2008) > # =========================> # > # * Offline indexing only mode - which is a minus if you want changes > # immediately reflected in the search index, and a plus if you were going to > # have to implement your own offline indexing anyway. > # > # * Collapsing - the equivalent of SQL's "group by". You can specify a field > # to collapse on, and only the most relevant result from each value of that > # field is returned. Along with a count of how many there are in total. > # acts_as_solr doesn't have this. > # > # * No highlighting - Xapian can't return you text highlighted with a search query. > # You can try and make do with TextHelper::highlight. I found the highlighting > # in acts_as_solr didn't really understand the query anyway. > # > # * Date range searching - maybe this works in acts_as_solr, but I never found > # out how. > # > # * Spelling correction - "did you mean?" built in and just works. > # > # * Multiple models - acts_as_xapian searches multiple models if you like, > # returning them mixed up together by relevancy. This is like multi_solr_search, > # only it is the default mode of operation and is properly supported. > # > # * No daemons - However, if you have more than one web server, you'll need to > # work out how to use Xapian's remote backend http://xapian.org/docs/remote.html. > # > # * One layer - full-powered Xapian is called directly from the Ruby, without > # Solr getting in the way whenever you want to use a new feature from Lucene. > # > # * No Java - an advantage if you're more used to working in the rest of the > # open source world. acts_as_xapian, it's pure Ruby and C++. > # > # * Xapian's awesome email list - the kids over at xapian-discuss are super > # helpful. Useful if you need to extend and improve acts_as_xapian. The > # Ruby bindings are mature and well maintained as part of Xapian. > # http://lists.xapian.org/mailman/listinfo/xapian-discuss > # > > _______________________________________________ > Xapian-discuss mailing list > Xapian-discuss at lists.xapian.org > http://lists.xapian.org/mailman/listinfo/xapian-discuss >