Hi James, thanks for the feedback. On Thu, Jul 28, 2016, at 00:22, James Aylett wrote:> This sounds great! I know sufficiently little about CJK that I won't > try to comment on that at all :)I've just opened a pull request for the CJK tokenizer: https://github.com/xapian/xapian/pull/114> I wonder if we can arrange suitable defaults to use your > implementation with the older API, and come up with a newer API that > allows a SnippetGenerator class to be used from the MSet.The FastMail snippet generator has been written when MSet didn't create snippets. I'll first compare both implementations to see if there is a good reason for them to coexist, or might just as well merge any additional features into MSet.> A good start would certainly be rebasing against master and opening a > pull request for each on github (this will trigger travis CI builds, > which is a helpful first pass in making sure everything good; it runs > against both G++ and Clang, which can expose some weirdnesses).Unfortunately, Travis breaks since pkg-config can't find libicu on the machine [1]. I could make the libicu dependency optional, and that might be useful for Xapian installation that don't bother with CJK text, but for Travis tests it would make sense to enable ICU. Cheers, Robert [1] https://travis-ci.org/xapian/xapian/jobs/148268282#L1522
On Fri, Jul 29, 2016 at 12:12:25PM +0200, rsto at paranoia.at wrote:> On Thu, Jul 28, 2016, at 00:22, James Aylett wrote: > > This sounds great! I know sufficiently little about CJK that I won't > > try to comment on that at all :) > > I've just opened a pull request for the CJK tokenizer: > https://github.com/xapian/xapian/pull/114That's great, thanks.> > I wonder if we can arrange suitable defaults to use your > > implementation with the older API, and come up with a newer API that > > allows a SnippetGenerator class to be used from the MSet. > > The FastMail snippet generator has been written when MSet didn't create > snippets. I'll first compare both implementations to see if there is a > good reason for them to coexist, or might just as well merge any > additional features into MSet.Terrific, thank you.> Unfortunately, Travis breaks since pkg-config can't find libicu on the > machine [1]. I could make the libicu dependency optional, and that might > be useful for Xapian installation that don't bother with CJK text, but > for Travis tests it would make sense to enable ICU.You should be able to install it; if you add libicu-dev to the packages stanza in .travis.yml it will put it in there. However you seem to be using pkg-config, which Ubuntu 12.04 LTS (which travis currently uses) doesn't provide for libicu. 14.04 LTS does, and it's possible to use that as a beta with travis, I think by changing: sudo: false to: sudo: required dist: trusty That will run about a minute slower than the current builds, but that's not a huge problem for the volume we're dealing with. I'd hope that they move 14.04 out of beta soon (at which point it should be possible to use with container builds, which are faster), since 12.04 only has support until next year. J -- James Aylett, occasional trouble-maker xapian.org
On Fri, Jul 29, 2016, at 20:12, rsto at paranoia.at wrote:> On Thu, Jul 28, 2016, at 00:22, James Aylett wrote: > > I wonder if we can arrange suitable defaults to use your > > implementation with the older API, and come up with a newer API that > > allows a SnippetGenerator class to be used from the MSet. > > The FastMail snippet generator has been written when MSet didn't create > snippets. I'll first compare both implementations to see if there is a > good reason for them to coexist, or might just as well merge any > additional features into MSet.As the sponsor of this work, I definitely want to have it in the most supported form for upstream so that it can be accepted easily, both from a generous point of view that I'd like the rest of the world to benefit, and from a very selfish point of view that I don't want us to have to keep maintaining these patches! The snippet generator code was initially written in 2011, and the original author isn't working for us any more, so it's definitely not up to date with the latest APIs :) Cheers, Bron. -- Bron Gondwana brong at fastmail.fm
On Fri, Jul 29, 2016 at 10:01:55PM +1000, Bron Gondwana wrote:> > The FastMail snippet generator has been written when MSet didn't create > > snippets. I'll first compare both implementations to see if there is a > > good reason for them to coexist, or might just as well merge any > > additional features into MSet. > > As the sponsor of this work, I definitely want to have it in the > most supported form for upstream so that it can be accepted easily, > both from a generous point of view that I'd like the rest of the > world to benefit, and from a very selfish point of view that I don't > want us to have to keep maintaining these patches!:) I'd also like to get the most powerful options for snippet generation into the hands of our users. I'm pretty sure we can come up with a good way of doing this without losing backward compatibility with anyone using the current MSet snippet generator.> The snippet generator code was initially written in 2011, and the > original author isn't working for us any more, so it's definitely > not up to date with the latest APIs :)Actually, the fact that it's separate makes it a lot easier to plot out a route for how to integrate things, I suspect :) J -- James Aylett, occasional trouble-maker xapian.org