Displaying 20 results from an estimated 3000 matches similar to: "Indexing Chinese?"
2018 Oct 04
0
Indexing Chinese?
We are a using a fork of Xapian for this at the Cyrus IMAP project [1], using the Unicode library word segmentation for Chinese, Japanese and Korean [2]. We are using it at FastMail in production since about 2 years and are generally happy with it, the search results improved over using ngrams. There's a pull request open to merge the patch upstream [3], but it's to be decided how to best
2018 Feb 10
1
How to let Xapian support Chinese searching
I installed Eprints, but it can not search Chinese. EPRINTS use Xapian to index data, how to let Xapian support CHINESE searching? Thanks a lot!
2015 Oct 13
2
iterate users with passwd-file passdb?
On 14 Oct 2015, at 00:01, Eric Abrahamsen <eric at ericabrahamsen.net> wrote:
>
> Joseph Tam <jtam.home at gmail.com> writes:
>
>> Eric Abrahamsen writes:
>>
>>> Simply: Is it possible to iterate over users if I'm using the
>>> passwd-file passdb driver? Do I need a SQL-based driver if I want to
>>> iterate?
>>
>> What
2015 Oct 13
2
iterate users with passwd-file passdb?
Eric Abrahamsen writes:
> Simply: Is it possible to iterate over users if I'm using the
> passwd-file passdb driver? Do I need a SQL-based driver if I want to
> iterate?
What do you mean by "iterate"? If you mean whether you can look up a
password entry in a multi-entry file, then yes, definitely. If you
mean to sequentially go through it and do a first/last/best match,
2009 Sep 15
2
S3 objects in S4 slots
Hello,
I am the maintainer of the stringkernels package and have come across
a problem with using S3 objects in my S4 classes.
Specifically, I have an S4 class with a slot that takes a text corpus
as a list of character vectors. tm (version 0.5) saves corpora as
lists with a class attribute of c("VCorpus", "Corpus", "list"). I
don't actually need the
2014 Sep 04
2
charset-specific searches, and continuation lines
Hi there,
I'm looking into improving IMAP search support for the Gnus Emacs mail
client, and trying to add the ability to search non-ascii characters. So
far as I know, I start this invocation with something like:
. UID SEARCH CHARSET UTF-8 TEXT {NNN}
Where NNN is the number of bytes in my search string. Dovecot then
responds with:
+ OK
So... what do I do then? I don't actually know
2007 Jun 05
7
Chinese, Japanese, Korean Tokenizer.
Hi,
I am looking for Chinese Japanese and Korean tokenizer that could can
be use to tokenize terms for CJK languages. I am not very familiar
with these languages however I think that these languages contains one
or more words in one symbol which it make more difficult to tokenize
into searchable terms.
Lucene has CJK Tokenizer ... and I am looking around if there is some
open source that we
2011 Sep 02
2
Classifying large text corpora using R
Dear everyone,
I am new to R, and I am looking at doing text classification on a huge
collection of documents (>500,000) which are distributed among 300 classes
(so basically, this is my training data). Would someone please be kind
enough to let me know about the R packages to use and their scalability
(time and space)?
I am very new to R and do not know of the right packages to use. I
2011 Nov 17
3
merging corpora and metadata
Greetings!
I loose all my metadata after concatenating corpora. This is an
example of what happens:
> meta(corpus.1)
MetaID cid fid selfirst selend fname
1 0 1 11 2169 2518 WCPD-2001-01-29-Pg217.scrb
2 0 1 14 9189 9702 WCPD-2003-01-13-Pg39.scrb
3 0 1 14 2109 2577 WCPD-2003-01-13-Pg39.scrb
....
....
17 0
2011 Sep 02
1
[PATCH 0/7] hivex + hivexml: Add byte runs for nodes and values
This changeset adds byte run reporters for node and value metadata in the
hivexml program. This location reporting required several new ABI
functions, which required new ABI return types. One benefit to the byte
run functions is additional sanity checks, which have revealed new data
or parsing errors when run on M57 patents images. An example error:
Image: Charlie, 2009-12-11, available at
2012 Jun 12
0
Fwd: [Corpora-List] ACM SIGIR 2012 Workshop on Open Source Information Retrieval
This might be an interesting option for some of you!
Regards,
Parth.
---------- Forwarded message ----------
From: Andrew Trotman <andrew at cs.otago.ac.nz>
Date: Tue, Jun 12, 2012 at 5:12 AM
Subject: [Corpora-List] ACM SIGIR 2012 Workshop on Open Source Information
Retrieval
To: corpora at uib.no
ACM SIGIR 2012 WORKSHOP ON OPEN SOURCE INFORMATION RETRIEVAL****
16 August 2012, Portland,
2013 Jun 30
1
LIST command -- quoting of folder names
If I open an imap connection to a local maildir installation like so:
/usr/lib/dovecot/imap -o mail_location=maildir:$HOME/.mail/account/:LAYOUT=fs
And issue:
c list "" *
This is the result (this is a gmail account):
* LIST (\HasChildren) "/" [Gmail]
* LIST (\HasNoChildren) "/" [Gmail]/Spam
* LIST (\HasNoChildren) "/" [Gmail]/Starred
* LIST
2014 Sep 09
1
minimal configuration for lucene fts
Hi,
I'm using dovecot (version 2.2.13 on archlinux) in the simplest,
no-brainer way possible. It sits between mbsync, which I use to fetch
mail from servers, and Gnus, my MUA. Both mbsync and Gnus connect to
dovecot with an invocation like this:
/usr/lib/dovecot/imap -o mail_location=maildir:$HOME/.mail/ea/
I have three different mail accounts, all that changes is the final
directory on the
2016 Jun 03
2
Custom assembler subset
On Fri, Jun 3, 2016 at 11:53 AM, Ahmed Bougacha <ahmed.bougacha at gmail.com>
wrote:
> -llvmdev at cs.uiuc.edu, that list isn't in use anymore.
>
> On Wed, Jun 1, 2016 at 4:48 PM, Kenneth Adam Miller via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
> > Hello all,
> >
> > I would like to restrain the compiler that I build on my local box from
>
2012 Sep 20
3
(no subject)
>From my book on corpus linguistics with R:
# (10) Imagine you have two vectors a and b such that
a<-c("d", "d", "j", "f", "e", "g", "f", "f", "i", "g")
b<-c("a", "g", "d", "f", "g", "a", "f", "a",
2017 Oct 16
2
Request removal of CyrusImapd how to
Hi!
On behalf of team Cyrus, I'm trying to review and clean up old instances
of Cyrus documentation around the web. It's a long lived OSS project and
has left behind many old docs in its years on the earth. This is
resulting in strange queries coming to us as people try to follow
information that's years out of date.
To this end I'd like to request removal of
2016 Dec 13
2
Pull requests: CJK words and Snippet generator
On Tue, Oct 04, 2016 at 10:37:49AM +1100, Bron Gondwana wrote:
> Robert is in Australia visiting the FastMail office to co-work with us for a
> couple of months, and I'd love to get this Xapian integration work done
> during this time. We're also looking to release Cyrus IMAPd version 3.0 some
> time in the next few months, and it would be great to not depend on too many
>
2005 May 27
1
logistic regression
Hi
I am working on corpora of automatically recognized utterances, looking
for features that predict error in the hypothesis the recognizer is
proposing.
I am using the glm functions to do logistic regression. I do this type
of thing:
* logistic.model = glm(formula = similarity ~., family = binomial,
data = data)
and end up with a model:
> summary(logistic.model)
Call:
2007 Jan 18
5
Docs moved to Trac
Hi all,
Peter Abrahamsen has duplicated all of the documentation and cookbook
pages in Puppet''s Trac page:
https://reductivelabs.com/trac/puppet/wiki/DocumentationStart
Please let me or Peter know if there are any problems.
--
A motion to adjourn is always in order.
--Robert Heinlein
---------------------------------------------------------------------
Luke
2018 Jun 30
2
Looking into a solution for Caldav (and possibly carddav) support
Am 30.06.2018 um 07:13 schrieb Mihai Badici:
> I can confirm you can use dovecot ( instead of cyrus) but is not trivial
> and I didn't know much about the compatibility for shared calendars.
Cyrus IMAPd provides exactly that, easily ;-)
https://www.cyrusimap.org/imap/download/installation/manage-dav.html
Alexander