I haven't had a chance to look at the patch and won't be able to do before January. Its design description sounds promising, though. The snippet generator code linked to by Bron contains mostly the same code as in my pull request, with two exceptions: it adds a flag to make the generator return the empty string for snippets without any matching terms. And it includes a fix to a possible memory corruption (only impacted our forked version). Cheers, Robert On Tue, Dec 13, 2016, at 15:14, Bron Gondwana wrote:> On Tue, 13 Dec 2016, at 15:04, Olly Betts wrote: > > On Tue, Oct 04, 2016 at 10:37:49AM +1100, Bron Gondwana wrote: > > > Robert is in Australia visiting the FastMail office to co-work with us for a > > > couple of months, and I'd love to get this Xapian integration work done > > > during this time. We're also looking to release Cyrus IMAPd version 3.0 some > > > time in the next few months, and it would be great to not depend on too many > > > custom patches! Ideally I'd like to be running vanilla upstream Xapian > > > libraries on FastMail's production rather than keeping a separate branch as > > > well. > > > > Did you get a chance to look at the patch I linked to from the snippet PR? > > > > https://github.com/xapian/xapian/pull/117 > > I didn't. Robert is traveling at the moment, so I'm not sure if he's > contactable. > > The latest code that Robert worked on before leaving is in our cyrusimap > repo at: > > https://github.com/cyrusimap/xapian/commits/cyrus > > Bron. > > -- > Bron Gondwana > brong at fastmail.fm
On Wed, 14 Dec 2016, at 22:41, Robert Stepanek wrote:> I haven't had a chance to look at the patch and won't be able to do > before January. Its design description sounds promising, though. > > The snippet generator code linked to by Bron contains mostly the same > code as in my pull request, with two exceptions: it adds a flag to make > the generator return the empty string for snippets without any matching > terms. And it includes a fix to a possible memory corruption (only > impacted our forked version).I already merged the memory corruption fix back into the patch it affected. The code it running in production at FastMail and working quite nicely :) The only thing we're tempted to look at is making it return the snippets as an array out to the client rather than returning a string with ... separators, just so that the client can decide which bits to show based on trying to show all the matches even on smaller screens. Bron. -- Bron Gondwana brong at fastmail.fm
Thanks to Olly for the enhanced snippet generator! I've closed the pull request for the term cover snippet patch and opened a tiny follow-up pull request: For Cyrus, we expect MSet::snippet to return an empty string if not a single search term was found in the text, and I've added an optional flag for that. The existing behaviour to return a substring of the input without markup remains the default. https://github.com/xapian/xapian/pull/130 The CJK word boundary analysis pull request has been rebased against latest master. We'd be very happy to see this merged into upstream, it has served us well since a couple of months, already. The PR make libicu a mandatory dependency and keeping it at that would make the patch straightforward. If it's necessary, I'll see if I can make ICU optional and set the according #ifdefs in the code. See https://github.com/xapian/xapian/pull/114 Unfortunately, both pull requests currently do not pass Travis. The build breaks during linking in xapian-letor with the message: bin/xapian-prepare-trainingfile.o: undefined reference to symbol '_ZNK6Xapian5Error15get_descriptionEv' Is that a known issue? Has anyone experienced the same? E.g. see https://travis-ci.org/xapian/xapian/jobs/190191549#L3541 Cheers, Robert