search for: cjk

Displaying 20 results from an estimated 108 matches for "cjk".

Did you mean: cj
2007 Jun 05
7
Chinese, Japanese, Korean Tokenizer.
Hi, I am looking for Chinese Japanese and Korean tokenizer that could can be use to tokenize terms for CJK languages. I am not very familiar with these languages however I think that these languages contains one or more words in one symbol which it make more difficult to tokenize into searchable terms. Lucene has CJK Tokenizer ... and I am looking around if there is some open source that we could use w...
2016 Jul 26
2
Pull requests: CJK words and Snippet generator
Hi, The Cyrus IMAP mail server uses Xapian as search engine. Recently, FastMail has sponsored implementation of two Xapian features: CJK word splitting and a generator for search snippets. I've been working on both features and we would be happy to get them merged into Xapian master. The CJK word tokenizer uses the word segmentation algorithms of the International Components for Unicode library (ICU), which brings support for J...
2011 Apr 07
1
GSOC 2011- CJK Support
Hello, erver one, I am Yongzhi Zhang, a chinese student. I'm interested in CJK Support(also known as Chinese, Japanese, and Korean Support), I have 6 years experience in software development (c/C++ and java) . I want to work on this project "CJK Support", I come from Beijing of china. Chinese is my native language. This is my advantage for ?CJK Support? . I have...
2016 Sep 07
2
Pull requests: CJK words and Snippet generator
...less performant snippet generator with a flag? > What are the other features the fastmail snippet generator has which > the current one lacks? I did study the fastmail one, but that was some > time ago and I don't remember clearly. Off the top of my head: normalization of terms and CJK support. With normalization I mean that the API allows to inject a custom preprocessor for document and search terms before they are matched (that's mainly useful due to a quirk in Cyrus search). To be honest, I am not sure if these features even need to be migrated. I'll run a couple of te...
2016 Sep 19
2
Pull requests: CJK words and Snippet generator
...t; above is - n is the number of terms in a document. I haven't done systematic testing of wall-clock time for the new feature. If it is crucial to go ahead with the patch, I could create a couple of benchmarks. > The tokenisation of the snippet uses the same code as indexing does, so > CJK should just work automatically, though it looks like there aren't > currently any testcases for this, so it would be worth checking (and > worth adding some) > > Normalisation could perhaps be done with a custom stemming algorithm. > The indexing pipeline doesn't currently h...
2006 Sep 27
3
Icon or CJK fonts in MENU TITLE, is that possible in the future ?
...atures in recently syslinux version. About new functions, actually I have another radical idea, since we are in Asia, most of the users here they would like to see some local fonts for the syslinux/pxelinux menu. I am wondering is that possible, in the future, the syslinux/pxelinux menu can support CJK fonts or icon ? This will be a very friendly feature for those people their mother tongue is not English. If it's too radical, just ignore that. Thanks in advance. -- Steven Shiau <steven _at_ nchc org tw> <steven _at_ stevenshiau org> National Center for High-performance Computi...
2016 Aug 05
2
Pull requests: CJK words and Snippet generator
On Thu, Aug 4, 2016, at 15:08, James Aylett wrote: > On Wed, Aug 03, 2016 at 08:17:05PM +0200, rsto at paranoia.at wrote: > > I'll notify you when the CJK pull request passes Travis. > > That's great, thanks! Alright, after lots of fiddling with .travis.yml I finally made the pull request build on Travis' trusty image: https://github.com/xapian/xapian/pull/114 I have kept ICU/pkg-config mandatory. Most probably that will have to chan...
2013 Mar 13
2
patch - Some CJK codepoints are also punctuation
-- Greg. -------------- next part -------------- A non-text attachment was scrubbed... Name: xapian-some-cjk-codepoints-are-also-punctuation.patch Type: text/x-patch Size: 1499 bytes Desc: not available URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20130313/4da8b0f9/attachment.bin>
2016 Jul 29
3
Pull requests: CJK words and Snippet generator
Hi James, thanks for the feedback. On Thu, Jul 28, 2016, at 00:22, James Aylett wrote: > This sounds great! I know sufficiently little about CJK that I won't > try to comment on that at all :) I've just opened a pull request for the CJK tokenizer: https://github.com/xapian/xapian/pull/114 > I wonder if we can arrange suitable defaults to use your > implementation with the older API, and come up with a newer API that >...
2016 Aug 18
2
Pull requests: CJK words and Snippet generator
Hi, On Thu, Aug 11, 2016, at 13:19, rsto at paranoia.at wrote: > The CJK word segmentation and snippet pull requests both pass Travis > since middle/end of last week. Did you find time to look at them? just checking in if you found time to look at the PRs? It'd be nice to know a tentative timeline, so I can plan if to build next features on top of our local fork...
2019 Mar 07
3
Ask for advice on exact requirements to fix #699 mixed CJK numbers
I am working on "#699 Better tokenisation of mixed CJK numbers", and have implemented a partial patch of Chinese for this ticket. Current code works well with special test cases and all tests in xapian-core could still pass. But I'm confused with exact requirements of the question, for how much we could pay with performance on enabling more c...
2019 Mar 09
2
Ask for advice on exact requirements to fix #699 mixed CJK numbers
...what I should do next. If it's not worth changing anything here as it's a rare case, sorry for my PR to github before the reply, maybe you need to close it on github. For another case, should I optimize current code with replacing set to a static array? Or rollback current modification to cjk-tokenizer and try to do some work with the stemming? Cheers, outdream -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20190309/144b6497/attachment.html>
2011 Apr 21
2
Chinese segmentation
hello, I have finished reading the papers, and i think it is time to design my project. First step will be determine the input characters are Chinese. i see the past post that cjk-tokenizer is just dealing with UTF-8 and unicode, but i see some other code system such as gbk and big5. i am wondering that should i just deal with UTF-8 and unicode?
2016 Aug 03
2
Pull requests: CJK words and Snippet generator
...e of icuconfig is indeed discouraged by the ICU maintainers . If I can't get the PR build properly without pkgconfig, I'll make both libicu and pkg-config optional. To do so, I'll try to get Travis build with 14.04 LTS, which might require a few build runs. I'll notify you when the CJK pull request passes Travis. Cheers, Robert
2011 Oct 22
3
Sweave, cairo_pdf, CJK, ghostscript
...ew days trying to put together an annotated map of China with R and some public GIS data: http://sourceforge.net/projects/outmodedbonsai/files/snpMatrix%20next/1.17.7.11/China_Choropleth_Maps.pdf/download It is done, and rather nice... there are a few issues: - the default pdf() device cannot do CJK with embedded fonts - and cairo_pdf() is not hooked up to Sweave yet. I have had a quick look, and it does not look too complicated, other than the fact that cairo_pdf() is mutually exclusive with pdf(); and the jpeg/png are new to 2.13 so it is probably just nobody has gotten round to it. (and cai...
2010 Jun 29
5
More than two font in a plot
Hi there, I am a Chinese R user. I hope to display Chinese character in a plot, and than save it in PostScript format. I have read the article titled "Non-Standard Fonts in PostScript and PDF Graphics", especially the section about CJK fonts. I also tried the code: > pdf("chinese.pdf", width=3, height=1) > grid.text("\u4F60\u597D", y=2/3, gp=gpar(fontfamily="CNS1")) > grid.text("is 'hello' in (Traditional) Chinese", y=1/3) > dev.off() however, it's not valid with p...
2011 Oct 22
0
patch to add cairo support to Sweave (Re: Sweave, cairo_pdf, CJK, ghostscript)
...nnotated map of China with R > and some public GIS data: > > http://sourceforge.net/projects/outmodedbonsai/files/snpMatrix%20next/1.17.7.11/China_Choropleth_Maps.pdf/download > > It is done, and rather nice... there are a few issues: > > - the default pdf() device cannot do CJK with embedded > fonts - and cairo_pdf() is not hooked up to Sweave yet. I > have had a quick look, and it does not look too complicated, > other than the fact that cairo_pdf() is mutually exclusive > with pdf(); and the jpeg/png are new to 2.13 so it is > probably just nobody has got...
2016 Dec 14
2
Pull requests: CJK words and Snippet generator
I haven't had a chance to look at the patch and won't be able to do before January. Its design description sounds promising, though. The snippet generator code linked to by Bron contains mostly the same code as in my pull request, with two exceptions: it adds a flag to make the generator return the empty string for snippets without any matching terms. And it includes a fix to a possible
2003 Oct 13
1
cant connect....cjk
Dear all, i have a linux RH8 with samba program running on it. i am trying to setup Samba from webmin program. My question is that should i create a SAMBA user? And if yes when i see the linux HDD from a windows 2000 computer it asks me a username and password. Do i put the samba user? the linux user? or the windows 2000 username? Sincerely Tks & Best Regards Koulis Constantine.
2016 Aug 03
2
Pull requests: CJK words and Snippet generator
...he test 'querypairwise1' [2], which AFAIK I didn't touch at all. Is that a known issue or did I break anything? [1] https://travis-ci.org/xapian/xapian/builds/149512190 [2] https://travis-ci.org/xapian/xapian/jobs/149512191#L15051 > > I've just opened a pull request for the CJK tokenizer: > > https://github.com/xapian/xapian/pull/114 > > Unfortunately, Travis breaks since pkg-config can't find libicu on the > > machine [1]. > You should be able to install it; if you add libicu-dev to the > packages stanza in .travis.yml it will put it in there...