similar to: How to let Xapian support Chinese searching

Displaying 20 results from an estimated 1000 matches similar to: "How to let Xapian support Chinese searching"

2018 Feb 13
2
How to set environment variable XAPIAN_CJK_NGRAM?
Olly, Thanks a lot! I installed Xapian 1.2.25 on Ubuntu 14.04. How to set environment variable XAPIAN_CJK_NGRAM? I'm a newbie to Xapian. Best wishes, Peter At 2018-02-12 20:00:02, xapian-discuss-request at lists.xapian.org wrote: >Send Xapian-discuss mailing list submissions to > xapian-discuss at lists.xapian.org > >To subscribe or unsubscribe via the World Wide Web,
2018 Feb 13
1
XAPIAN_CJK_NGRAM can work
Olly, That's very kind of you to help me. When I used "env XAPIAN_CJK_NGRAM=1 indexer-command" indexer Eprints again, it can search Chinese by character level. But it is not so good for words or phrase level. ICU would be better than CJK. Hope ICU can use soon! Best wishes, Peter At 2018-02-13 11:44:46, "Olly Betts" <olly at survex.com> wrote: >On
2019 Mar 07
3
Ask for advice on exact requirements to fix #699 mixed CJK numbers
I am working on "#699 Better tokenisation of mixed CJK numbers", and have implemented a partial patch of Chinese for this ticket. Current code works well with special test cases and all tests in xapian-core could still pass. But I'm confused with exact requirements of the question, for how much we could pay with performance on enabling more cases, and if there are better methods to
2011 Feb 07
4
http://xapian.org/download RHEL instructions
Hello :-) The instructions regards Tim Brody's packages are incomplete in that rpm-eprints-org-key is needed by rpm-eprints-org-xapian-5-1.noarch. The RPM for rpm-eprints-org-key is at http://rpm.eprints.org/rpm-eprints-org-key-1-1.noarch.rpm Best Charles
2011 Sep 14
1
Integrated Chinese tokenizer SCWS in xapian-core
Xapian is a very excellent open source search engine library, but there is no native support for Chinese word segmentation in queryparser and termgenerator. Therefore, I modified small amount of source codes, integrated into the SCWS tokenizer, that is the same open-source and developped by myself. Anyone can obtain the patch from below URL. After patching, Xapian::QueryParser::parse_query and
2010 Nov 22
5
perl bindings
Hi All, When are the XS-based Perl bindings going to be deprecated in favour of the SWIG bindings? Please remove the dead RHEL RPMs from: http://xapian.org/download I've built RPMs for RHEL5/CentOS5 (with a different signing key) here: http://rpm.eprints.org/xapian/5/ RHEL4 is eol and shouldn't be used. For my own convenience I have written an RPM for the repository: rpm -ivh
2018 Feb 13
0
How to set environment variable XAPIAN_CJK_NGRAM?
On Tue, Feb 13, 2018 at 09:32:26AM +0800, Peter Zhao wrote: > I installed Xapian 1.2.25 on Ubuntu 14.04. How to set environment > variable XAPIAN_CJK_NGRAM? I'm a newbie to Xapian. This is really a generic Unix question rather than a Xapian one, and the answer rather depend how eprints gets run. When you're running a program from the shell, you can use env in front of the command
2018 Oct 04
2
Indexing Chinese?
My second (and hopefully last) question: is there any more news on indexing Chinese characters and words? Searching online mostly returns results from a decade ago or more, with nothing very conclusive. How close is this to possible? For the time being I'm doing some pre-processing on long strings of Chinese, breaking on punctuation in order to avoid errors. But I have some large corpora of
2016 Jul 29
3
Pull requests: CJK words and Snippet generator
Hi James, thanks for the feedback. On Thu, Jul 28, 2016, at 00:22, James Aylett wrote: > This sounds great! I know sufficiently little about CJK that I won't > try to comment on that at all :) I've just opened a pull request for the CJK tokenizer: https://github.com/xapian/xapian/pull/114 > I wonder if we can arrange suitable defaults to use your > implementation with the
2016 Jul 26
2
Pull requests: CJK words and Snippet generator
Hi, The Cyrus IMAP mail server uses Xapian as search engine. Recently, FastMail has sponsored implementation of two Xapian features: CJK word splitting and a generator for search snippets. I've been working on both features and we would be happy to get them merged into Xapian master. The CJK word tokenizer uses the word segmentation algorithms of the International Components for Unicode
2016 Aug 03
2
Pull requests: CJK words and Snippet generator
Hi, On Fri, Jul 29, 2016, at 13:45, James Aylett wrote: > On Fri, Jul 29, 2016 at 12:12:25PM +0200, rsto at paranoia.at wrote: > > The FastMail snippet generator has been written when MSet didn't create > > snippets. I'll first compare both implementations to see if there is a > > good reason for them to coexist, or might just as well merge any > > additional
2010 Sep 01
8
FIXMEs in Search::Xapian
Carrying on this conversation: http://lists.tartarus.org/pipermail/xapian-discuss/2007-March/003513.html void TermGenerator::set_stopper(stopper) Stopper * stopper CODE: // FIXME: no corresponding SvREFCNT_dec(), but a leak seems better than // a SEGV! SvREFCNT_inc(ST(1)); THIS->set_stopper(stopper); It would be good to fix these FIXMEs. A class-level HASH could be
2012 May 03
1
Incorrect line nums displayed in full text search
Hi, Could you please clarify a query regarding full text search in Xapain? I have installed LXRng (cross-referencer), which internally uses Xapain for text search. I am using CentOS 6 & have installed the following Xapian packages: $ yum list installed | grep -i xapian perl-Search-Xapian.x86_64 1.2.9.0-1.el6 @rpm-eprints-org-xapian perl-Search-Xapian-debuginfo.i686
2016 Aug 03
2
Pull requests: CJK words and Snippet generator
On Wed, Aug 3, 2016, at 19:26, James Aylett wrote: > On Wed, Aug 03, 2016 at 06:54:32PM +0200, rsto at paranoia.at wrote: > > Oddly enough, the pull request causes Travis to break for clang but not > > for gcc [1]. That's because the clang build process fails for the test > > 'querypairwise1' [2], which AFAIK I didn't touch at all. Is that a > > known
2015 May 12
1
RPM for RHEL7
On Mon, May 11, 2015 at 06:18:04PM +0000, Marc Fromm wrote: > Marc Fromm: > > Will the current RPM packages work on a RHEL7 system? I am upgrading > > from a RHEL5 system that uses the omega search tool. > > Since there have been no replies, I take it to mean that there is no > compatible package for RHEL7, thus I must remove the omega search > tool. I don't use
2017 Feb 08
1
searching for " in phrase and other special chars
Hello, I'm reading xapian-core/docs/queryparser.rst and haven't been able to find a way to escape " (double-quote) inside quoted phrases. Is this possible? I'm also wondering if searching for other special characters, such as a literal '*', is possible without triggering a wildcard match. It would be helpful for some source code searches. Thanks!
2013 Aug 26
2
Perl interface isn't working in 1.2.x
On 08/25/2013 05:02 PM, Olly Betts wrote: > So the simple fix is > probably just to install the perl-Search-Xapian RPM instead. Thanks, the Centos 6 repos don't have that rpm and the http://xapian.org/download page seems to only cover the XS bindings, if I am reading this correctly: But I was able to remove the rpm packages and compile and install the core and swig from source.
2012 Aug 30
1
path analysis help
Hi there, I searched R-help list with "path analysis" as keyword, and learn that sem package can do it. However, I don't figure out a way to construct the model for the path diagram as Fig. 1. in Huang et al. (2002)[1]. I try the following code: huang.cor <- readMoments(diag=FALSE, names=c('x1', 'x2', 'x3', 'y')) 0.76 0.91 0.72 0.94 0.77 0.83
2013 Apr 20
4
warning: dl() [function.dl]: xapian: Unable to initialize module Module compiled with module API=20050922
I installed xapian core on Centos system. On execution, I get this error message: "warning: dl() [function.dl]: xapian: Unable to initialize module Module compiled with module API=20050922, debug=0, thread-safety=0 PHP compiled with module API=20060613, debug=0, thread-safety=0 These options need to match in /usr/share/php/xapian.php on line 22." PHP 5.2.16 (cli) (built: Dec 17 2010
2007 Dec 29
3
Term-Flags
Hi, Is it necessary to set the down below flag to the TermGenerator, if I want the "Did you mean ..." spelling corrections? Xapian::TermGenerator::flags::FLAG_SPELLING Thank you very much Markus