search for: rsto

Displaying 17 results from an estimated 17 matches for "rsto".

Did you mean: rst
2016 Jul 26
2
Pull requests: CJK words and Snippet generator
...es. Would you be interested in these features? Just let us know what would be required to get them merged. As a minimum I'd rebase the current forks against latest master. I'll be happy to answer any questions or change requests. Cheers, Robert [1] CJK word splitter: https://github.com/rsto/xapian/commit/16dd9b232eb9b6e7346184db0790b6655180492c [2] Snippet generator: https://github.com/rsto/xapian/commit/979757c161ec912c98f2fe87595d7529740e3247#diff-832f4feb83e5ba60ebb64b4d8b93d93fR1
2016 Aug 18
2
Pull requests: CJK words and Snippet generator
Hi, On Thu, Aug 11, 2016, at 13:19, rsto at paranoia.at wrote: > The CJK word segmentation and snippet pull requests both pass Travis > since middle/end of last week. Did you find time to look at them? just checking in if you found time to look at the PRs? It'd be nice to know a tentative timeline, so I can plan if to build nex...
2016 Sep 19
2
Pull requests: CJK words and Snippet generator
Olly, sorry for my delayed reply. Am Mo, 12. Sep 2016, um 05:32, schrieb Olly Betts: > On Wed, Sep 07, 2016 at 02:30:16PM +0200, rsto at paranoia.at wrote: > > On Tue, Sep 6, 2016, at 09:16, Olly Betts wrote: > > > I think my main concerns are about efficiency [...] > > For the proposed term coverage, the implementation looks up and inserts > > terms into a map. That makes it slightly less efficient wit...
2016 Aug 03
2
Pull requests: CJK words and Snippet generator
Hi, On Fri, Jul 29, 2016, at 13:45, James Aylett wrote: > On Fri, Jul 29, 2016 at 12:12:25PM +0200, rsto at paranoia.at wrote: > > The FastMail snippet generator has been written when MSet didn't create > > snippets. I'll first compare both implementations to see if there is a > > good reason for them to coexist, or might just as well merge any > > additional features i...
2016 Aug 05
2
Pull requests: CJK words and Snippet generator
On Thu, Aug 4, 2016, at 15:08, James Aylett wrote: > On Wed, Aug 03, 2016 at 08:17:05PM +0200, rsto at paranoia.at wrote: > > I'll notify you when the CJK pull request passes Travis. > > That's great, thanks! Alright, after lots of fiddling with .travis.yml I finally made the pull request build on Travis' trusty image: https://github.com/xapian/xapian/pull/114 I have ke...
2016 Aug 03
2
Pull requests: CJK words and Snippet generator
On Wed, Aug 3, 2016, at 19:26, James Aylett wrote: > On Wed, Aug 03, 2016 at 06:54:32PM +0200, rsto at paranoia.at wrote: > > Oddly enough, the pull request causes Travis to break for clang but not > > for gcc [1]. That's because the clang build process fails for the test > > 'querypairwise1' [2], which AFAIK I didn't touch at all. Is that a > > known issu...
2016 Jul 29
3
Pull requests: CJK words and Snippet generator
Hi James, thanks for the feedback. On Thu, Jul 28, 2016, at 00:22, James Aylett wrote: > This sounds great! I know sufficiently little about CJK that I won't > try to comment on that at all :) I've just opened a pull request for the CJK tokenizer: https://github.com/xapian/xapian/pull/114 > I wonder if we can arrange suitable defaults to use your > implementation with the
2016 Sep 07
2
Pull requests: CJK words and Snippet generator
On Tue, Sep 6, 2016, at 09:16, Olly Betts wrote: > I think my main concerns are about efficiency (since that a major > motivation for the current implementation, so slowing it down would be > annoying), and whether we can just make this the standard behaviour > rather than adding an option. The current implementation is O(n) and I took care to keep it at that. For the proposed term
2024 Jan 04
1
Possible bug using FLAG_WORD_BREAKS with fullwidth Unicode codepoints
...ueries for the lowercase form which seems to be the result of unconditional lower-casing at https://github.com/xapian/xapian/blob/master/xapian-core/queryparser/queryparser.lemony#L1459. As a result, the query returns no result. I have written code that demonstrates this at https://gist.github.com/rsto/168a61536793e10a0a07c3920977e5eb Now, I think that much of this issue can be prevented by normalizing both indexed text and queries before passing them over the Xapian, but this requires to rewrite indexes so isn't necessarily a quick fix. As a workaround, I chose to detect such queries and qu...
2017 Jul 31
2
Segmentation fault in matcher/queryoptimiser
...e used already somewhere else? Should we probably keep it and make QueryOptimiser take ownership? [1] https://github.com/xapian/xapian/blob/master/xapian-core/matcher/queryoptimiser.h#L51 [2] https://github.com/xapian/xapian/blob/master/xapian-core/api/queryinternal.cc#L1665 [3] https://github.com/rsto/xapian/commit/3e7d65b25eef00347f5c764af5ff4d770433ac9b [4] https://github.com/xapian/xapian/blob/master/xapian-core/matcher/queryoptimiser.h#L106 Cheers, Robert -------------- next part -------------- A non-text attachment was scrubbed... Name: valgrind.log Type: application/octet-stream Size: 53...
2018 Feb 13
0
How to set environment variable XAPIAN_CJK_NGRAM?
On Tue, Feb 13, 2018, at 02:32, Peter Zhao wrote: > At 2018-02-12 20:00:02, xapian-discuss-request at lists.xapian.org wrote: > >There's also a patch to add support for using libicu to find word > >boundaries: > > > >https://github.com/xapian/xapian/pull/114 > > > >That'll get merged soon hopefully (mostly we need to sort out how to > >manage
2018 Oct 04
0
Indexing Chinese?
We are a using a fork of Xapian for this at the Cyrus IMAP project [1], using the Unicode library word segmentation for Chinese, Japanese and Korean [2]. We are using it at FastMail in production since about 2 years and are generally happy with it, the search results improved over using ngrams. There's a pull request open to merge the patch upstream [3], but it's to be decided how to best
2017 Oct 16
2
Current master unit test errors
I'm preparing a pull request for the master branch and noticed that `make check` on a clone of the xapian repository fails badly. I haven't merged my changes and built from e24cc6018de0. Is is just me or is there something broken in the master branch? Running test './apitest' under valgrind Running tests with backend "none"... Running test: defaultctor1...
2017 Aug 02
2
Segmentation fault in matcher/queryoptimiser
Olly, thanks for your feedback. On Mon, Jul 31, 2017, at 23:29, Olly Betts wrote: > On Mon, Jul 31, 2017 at 09:24:29AM +0200, Robert Stepanek wrote: > > We'd appreciate any hints on how to fix this. I've written up our > > findings and solution attempts below. Should we post this on trac? > > Yes, it'd be good to have a ticket to track this. I've created
2024 Jan 08
1
Possible bug using FLAG_WORD_BREAKS with fullwidth Unicode codepoints
On Sun, Jan 7, 2024, at 7:45 PM, Olly Betts wrote: > I've restarted trac. I now created a pull request: https://github.com/xapian/xapian/pull/329 Should I create a trac issue, too? > Assuming the latter is valid, just removing this block (or removing the > parts of it which are Lu or Ll) should fix the problem as then > tokenisation will switch mode - I tried this and it fixes
2024 Jan 10
2
Possible bug using FLAG_WORD_BREAKS with fullwidth Unicode codepoints
On Tue, Jan 9, 2024, at 3:28 AM, Olly Betts wrote: > Thanks, that looks good - now merged. Thanks! > Did you already check the other ranges for cased letters? I can but if > you have already there's not much point. I did not. If you find time, that'd be great. Otherwise I can make room for it in the next days. > > The fullwidth "????? ??????" tests suggests to
2016 Dec 14
2
Pull requests: CJK words and Snippet generator
I haven't had a chance to look at the patch and won't be able to do before January. Its design description sounds promising, though. The snippet generator code linked to by Bron contains mostly the same code as in my pull request, with two exceptions: it adds a flag to make the generator return the empty string for snippets without any matching terms. And it includes a fix to a possible