Olly, Thanks a lot!
I installed Xapian 1.2.25 on Ubuntu 14.04. How to set environment variable
XAPIAN_CJK_NGRAM? I'm a newbie to Xapian.
Best wishes,
Peter
At 2018-02-12 20:00:02, xapian-discuss-request at lists.xapian.org
wrote:>Send Xapian-discuss mailing list submissions to
> xapian-discuss at lists.xapian.org
>
>To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.xapian.org/mailman/listinfo/xapian-discuss
>or, via email, send a message with subject or body 'help' to
> xapian-discuss-request at lists.xapian.org
>
>You can reach the person managing the list at
> xapian-discuss-owner at lists.xapian.org
>
>When replying, please edit your Subject line so it is more specific
>than "Re: Contents of Xapian-discuss digest..."
>
>
>Today's Topics:
>
> 1. Re: How to let Xapian support Chinese searching (Olly Betts)
> 2. Re: How to ensure thread-safety (Olly Betts)
>
>
>----------------------------------------------------------------------
>
>Message: 1
>Date: Sun, 11 Feb 2018 20:34:44 +0000
>From: Olly Betts <olly at survex.com>
>To: Peter Zhao <peterzhaonj at 163.com>
>Cc: xapian-discuss at lists.xapian.org
>Subject: Re: How to let Xapian support Chinese searching
>Message-ID: <20180211203444.GH12724 at survex.com>
>Content-Type: text/plain; charset=us-ascii
>
>On Sat, Feb 10, 2018 at 08:26:52PM +0800, Peter Zhao wrote:
>> I installed Eprints, but it can not search Chinese. EPRINTS use
>> Xapian to index data, how to let Xapian support CHINESE searching?
>
>Current releases support indexing ngrams for CJK text - to enable this
>you need to pass FLAG_CJK_NGRAM to TermGenerator when indexing and to
>QueryParser when searching.
>
>You can also activate this flag without code changes by setting
>environment variable XAPIAN_CJK_NGRAM to a non-empty value (don't forget
>to export it if you're setting it via the shell).
>
>There's also a patch to add support for using libicu to find word
>boundaries:
>
>https://github.com/xapian/xapian/pull/114
>
>That'll get merged soon hopefully (mostly we need to sort out how to
>manage the libicu dependency - do we make it a hard dependency, or an
>option for how to build xapian-core, etc) but if you're happy to build
>xapian-core from source please try it and give feedback on how well
>it works.
>
>An algorithm to identify word boundaries should result in a
>significantly smaller database than indexing ngrams, but it's reliant on
>the algorithm finding the correct boundaries. If the wrong boundaries
>are identified that can lead to both false positives and false
>negatives.
>
>Cheers,
> Olly
>
>
>
>------------------------------
>
>Message: 2
>Date: Sun, 11 Feb 2018 20:51:35 +0000
>From: Olly Betts <olly at survex.com>
>To: Kim Walisch <kim.walisch at gmail.com>
>Cc: xapian-discuss at lists.xapian.org
>Subject: Re: How to ensure thread-safety
>Message-ID: <20180211205135.GI12724 at survex.com>
>Content-Type: text/plain; charset=us-ascii
>
>On Thu, Feb 08, 2018 at 04:18:12PM +0100, Kim Walisch wrote:
>> But it is still not clear to me how to ensure thread-safety when using
>> libxapian (C++ API). Usually when doing multi-threading many threads
can
>> read the same variable concurrently without locking provided none of
the
>> threads modifies the variable.
>
>That's true for simple types, but breaks down for classes because they
>may have mutable members - e.g. for caching values computed lazily:
>
>class FactorialFactory {
> private:
> mutable int r = -1;
> mutable int n;
> public:
> FactorialFactory() {}
>
> int calc(int v) const {
> if (r < 0 || n != v) {
> r = n;
> n = v;
> for (int i = n - 1; i > 1; --i) {
> r *= i;
> }
> }
> return r;
> }
>};
>
>It's not safe to concurrently call f.calc() from different threads, even
>though conceptually calc() is a read-only method.
>
>Cheers,
> Olly
>
>
>
>------------------------------
>
>Subject: Digest Footer
>
>_______________________________________________
>Xapian-discuss mailing list
>Xapian-discuss at lists.xapian.org
>https://lists.xapian.org/mailman/listinfo/xapian-discuss
>
>
>------------------------------
>
>End of Xapian-discuss Digest, Vol 162, Issue 3
>**********************************************