similar to: Pull requests: CJK words and Snippet generator

Displaying 20 results from an estimated 6000 matches similar to: "Pull requests: CJK words and Snippet generator"

2016 Aug 18
2
Pull requests: CJK words and Snippet generator
Hi, On Thu, Aug 11, 2016, at 13:19, rsto at paranoia.at wrote: > The CJK word segmentation and snippet pull requests both pass Travis > since middle/end of last week. Did you find time to look at them? just checking in if you found time to look at the PRs? It'd be nice to know a tentative timeline, so I can plan if to build next features on top of our local fork or the upstream PRs.
2016 Aug 03
2
Pull requests: CJK words and Snippet generator
Hi, On Fri, Jul 29, 2016, at 13:45, James Aylett wrote: > On Fri, Jul 29, 2016 at 12:12:25PM +0200, rsto at paranoia.at wrote: > > The FastMail snippet generator has been written when MSet didn't create > > snippets. I'll first compare both implementations to see if there is a > > good reason for them to coexist, or might just as well merge any > > additional
2016 Aug 03
2
Pull requests: CJK words and Snippet generator
On Wed, Aug 3, 2016, at 19:26, James Aylett wrote: > On Wed, Aug 03, 2016 at 06:54:32PM +0200, rsto at paranoia.at wrote: > > Oddly enough, the pull request causes Travis to break for clang but not > > for gcc [1]. That's because the clang build process fails for the test > > 'querypairwise1' [2], which AFAIK I didn't touch at all. Is that a > > known
2016 Jul 26
2
Pull requests: CJK words and Snippet generator
Hi, The Cyrus IMAP mail server uses Xapian as search engine. Recently, FastMail has sponsored implementation of two Xapian features: CJK word splitting and a generator for search snippets. I've been working on both features and we would be happy to get them merged into Xapian master. The CJK word tokenizer uses the word segmentation algorithms of the International Components for Unicode
2016 Jul 29
3
Pull requests: CJK words and Snippet generator
Hi James, thanks for the feedback. On Thu, Jul 28, 2016, at 00:22, James Aylett wrote: > This sounds great! I know sufficiently little about CJK that I won't > try to comment on that at all :) I've just opened a pull request for the CJK tokenizer: https://github.com/xapian/xapian/pull/114 > I wonder if we can arrange suitable defaults to use your > implementation with the
2016 Sep 19
2
Pull requests: CJK words and Snippet generator
Olly, sorry for my delayed reply. Am Mo, 12. Sep 2016, um 05:32, schrieb Olly Betts: > On Wed, Sep 07, 2016 at 02:30:16PM +0200, rsto at paranoia.at wrote: > > On Tue, Sep 6, 2016, at 09:16, Olly Betts wrote: > > > I think my main concerns are about efficiency [...] > > For the proposed term coverage, the implementation looks up and inserts > > terms into a map. That
2016 Sep 07
2
Pull requests: CJK words and Snippet generator
On Tue, Sep 6, 2016, at 09:16, Olly Betts wrote: > I think my main concerns are about efficiency (since that a major > motivation for the current implementation, so slowing it down would be > annoying), and whether we can just make this the standard behaviour > rather than adding an option. The current implementation is O(n) and I took care to keep it at that. For the proposed term
2016 Dec 14
2
Pull requests: CJK words and Snippet generator
I haven't had a chance to look at the patch and won't be able to do before January. Its design description sounds promising, though. The snippet generator code linked to by Bron contains mostly the same code as in my pull request, with two exceptions: it adds a flag to make the generator return the empty string for snippets without any matching terms. And it includes a fix to a possible
2016 Dec 13
2
Pull requests: CJK words and Snippet generator
On Tue, Oct 04, 2016 at 10:37:49AM +1100, Bron Gondwana wrote: > Robert is in Australia visiting the FastMail office to co-work with us for a > couple of months, and I'd love to get this Xapian integration work done > during this time. We're also looking to release Cyrus IMAPd version 3.0 some > time in the next few months, and it would be great to not depend on too many >
2011 Apr 07
1
GSOC 2011- CJK Support
Hello, erver one, I am Yongzhi Zhang, a chinese student. I'm interested in CJK Support(also known as Chinese, Japanese, and Korean Support), I have 6 years experience in software development (c/C++ and java) . I want to work on this project "CJK Support", I come from Beijing of china. Chinese is my native language. This is my advantage for ?CJK Support? . I have fixed a bug for
2019 Mar 07
3
Ask for advice on exact requirements to fix #699 mixed CJK numbers
I am working on "#699 Better tokenisation of mixed CJK numbers", and have implemented a partial patch of Chinese for this ticket. Current code works well with special test cases and all tests in xapian-core could still pass. But I'm confused with exact requirements of the question, for how much we could pay with performance on enabling more cases, and if there are better methods to
2006 Sep 27
3
Icon or CJK fonts in MENU TITLE, is that possible in the future ?
First I would like to say thank you to HPA for providing some really nice features in recently syslinux version. About new functions, actually I have another radical idea, since we are in Asia, most of the users here they would like to see some local fonts for the syslinux/pxelinux menu. I am wondering is that possible, in the future, the syslinux/pxelinux menu can support CJK fonts or icon ?
2013 Mar 13
2
patch - Some CJK codepoints are also punctuation
-- Greg. -------------- next part -------------- A non-text attachment was scrubbed... Name: xapian-some-cjk-codepoints-are-also-punctuation.patch Type: text/x-patch Size: 1499 bytes Desc: not available URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20130313/4da8b0f9/attachment.bin>
2019 Mar 09
2
Ask for advice on exact requirements to fix #699 mixed CJK numbers
Thanks for your patience. I'm still confused of what I should do next. If it's not worth changing anything here as it's a rare case, sorry for my PR to github before the reply, maybe you need to close it on github. For another case, should I optimize current code with replacing set to a static array? Or rollback current modification to cjk-tokenizer and try to do some work with the
2011 Oct 22
3
Sweave, cairo_pdf, CJK, ghostscript
I have had some fun in the last few days trying to put together an annotated map of China with R and some public GIS data: http://sourceforge.net/projects/outmodedbonsai/files/snpMatrix%20next/1.17.7.11/China_Choropleth_Maps.pdf/download It is done, and rather nice... there are a few issues: - the default pdf() device cannot do CJK with embedded fonts - and cairo_pdf() is not hooked up to
2007 Jun 05
7
Chinese, Japanese, Korean Tokenizer.
Hi, I am looking for Chinese Japanese and Korean tokenizer that could can be use to tokenize terms for CJK languages. I am not very familiar with these languages however I think that these languages contains one or more words in one symbol which it make more difficult to tokenize into searchable terms. Lucene has CJK Tokenizer ... and I am looking around if there is some open source that we
2011 Oct 22
0
patch to add cairo support to Sweave (Re: Sweave, cairo_pdf, CJK, ghostscript)
It was as easy as I thought it was half a day ago - here is a patch against R trunk to add cairo support to the Sweave driver, an example Sweave input, and the resulting output. A few more notes: - obviously the documentation needs to be updated... a bit more work to do. - some check to make sure "cairo" and "pdf" are not both set would be nice, as well as checking
2010 Jun 29
5
More than two font in a plot
Hi there, I am a Chinese R user. I hope to display Chinese character in a plot, and than save it in PostScript format. I have read the article titled "Non-Standard Fonts in PostScript and PDF Graphics", especially the section about CJK fonts. I also tried the code: > pdf("chinese.pdf", width=3, height=1) > grid.text("\u4F60\u597D", y=2/3,
2024 Jan 04
1
Possible bug using FLAG_WORD_BREAKS with fullwidth Unicode codepoints
I think I found a bug in Xapian 1.5 when using FLAG_WORD_BREAKS for input that contains characters in Unicode Halfwidth and Fullwidth Forms (https://unicode.org/charts/PDF/UFF00.pdf). Since I am undecided yet if and how to fix this in Xapian I haven't come up with a pull request. Because trac currently is offline, I could not file a bug. I hope it's OK to post my analysis here first,
2011 Apr 21
2
Chinese segmentation
hello, I have finished reading the papers, and i think it is time to design my project. First step will be determine the input characters are Chinese. i see the past post that cjk-tokenizer is just dealing with UTF-8 and unicode, but i see some other code system such as gbk and big5. i am wondering that should i just deal with UTF-8 and unicode?