similar to: Chinese, Japanese, Korean Tokenizer.

Displaying 20 results from an estimated 3000 matches similar to: "Chinese, Japanese, Korean Tokenizer."

2011 Apr 21
2
Chinese segmentation
hello, I have finished reading the papers, and i think it is time to design my project. First step will be determine the input characters are Chinese. i see the past post that cjk-tokenizer is just dealing with UTF-8 and unicode, but i see some other code system such as gbk and big5. i am wondering that should i just deal with UTF-8 and unicode?
2007 Jul 09
7
Xapian pubmeet
Hi all, A few of us have been discussing whether we should have a Xapian social gathering of some kind. The current idea is meeting up in a pub in London some time in autumn for drinks and food. However all of this really depends on who might be able to come! It would be a chance to meet other Xapian enthusiasts in an informal social setting and talk about all things search-related (and
2011 Apr 07
1
GSOC 2011- CJK Support
Hello, erver one, I am Yongzhi Zhang, a chinese student. I'm interested in CJK Support(also known as Chinese, Japanese, and Korean Support), I have 6 years experience in software development (c/C++ and java) . I want to work on this project "CJK Support", I come from Beijing of china. Chinese is my native language. This is my advantage for ?CJK Support? . I have fixed a bug for
2011 Sep 14
1
Integrated Chinese tokenizer SCWS in xapian-core
Xapian is a very excellent open source search engine library, but there is no native support for Chinese word segmentation in queryparser and termgenerator. Therefore, I modified small amount of source codes, integrated into the SCWS tokenizer, that is the same open-source and developped by myself. Anyone can obtain the patch from below URL. After patching, Xapian::QueryParser::parse_query and
2012 Mar 29
1
GSoC - Improve Japanese Support
Hi there, My name is Julia Wilson and I'm a grad student in Computational Linguistics at Brandeis University. As a GSoC project I'm interested in improving Japanese language support, and I had a couple of questions for the application I'm putting together. I know Japanese - I'm not a native speaker by any means, but I'm pretty good - and I'm really interested in the
2019 Mar 09
2
Ask for advice on exact requirements to fix #699 mixed CJK numbers
Thanks for your patience. I'm still confused of what I should do next. If it's not worth changing anything here as it's a rare case, sorry for my PR to github before the reply, maybe you need to close it on github. For another case, should I optimize current code with replacing set to a static array? Or rollback current modification to cjk-tokenizer and try to do some work with the
2019 Mar 07
3
Ask for advice on exact requirements to fix #699 mixed CJK numbers
I am working on "#699 Better tokenisation of mixed CJK numbers", and have implemented a partial patch of Chinese for this ticket. Current code works well with special test cases and all tests in xapian-core could still pass. But I'm confused with exact requirements of the question, for how much we could pay with performance on enabling more cases, and if there are better methods to
2007 Sep 16
1
Document clustering module?
Hi, I am implementing some document clustering algorithms in the xapian core. I would like to know if this kind of module will be considered to be incorporated into the core release. Or is there already some document clustering module that is just not open-sourced yet? Best, Yung-chung Lin
2006 Jul 07
4
How to add Asia token analyzer to ferret simply?
Hi,David Can you give me an example of how to add analyzer to ferret to Asian languages? My web application will have to support multi language search,which means,for example,both Chinese and English will be searched through the form. Currently,I have decided to use the simple token principles,which means that every Chinese character will be a token,although this is not so well in some
2010 Jun 29
5
More than two font in a plot
Hi there, I am a Chinese R user. I hope to display Chinese character in a plot, and than save it in PostScript format. I have read the article titled "Non-Standard Fonts in PostScript and PDF Graphics", especially the section about CJK fonts. I also tried the code: > pdf("chinese.pdf", width=3, height=1) > grid.text("\u4F60\u597D", y=2/3,
2010 May 19
1
Multiple language output - Correct in RGui, wrong in .txt after sink()
I have the following problem with outputting multilingual data to a file. I get (except for Korean) what I expect as result in the RGui, but when I use sink() to output to a text file loose the characters in the foreign languages. I post a small example below. Since I am not sure how well my email system as the list copes with all the different characters I have additionally created a pdf
2016 Jul 26
2
Pull requests: CJK words and Snippet generator
Hi, The Cyrus IMAP mail server uses Xapian as search engine. Recently, FastMail has sponsored implementation of two Xapian features: CJK word splitting and a generator for search snippets. I've been working on both features and we would be happy to get them merged into Xapian master. The CJK word tokenizer uses the word segmentation algorithms of the International Components for Unicode
2011 Apr 01
1
Apply the google summer code (additional idea)
Hi all: As I have gone through the The Xapian-devel Archives, it seems many people would like to do the project "weight schemes" and few would like to do the CJK project. I am a native speaker of Chinese and I learned a little Korean and Japanese, so if possible, I would like to apply this projects too. Fan Zhang -- My Homepage: http://sites.google.com/site/zhfan555/ PhD Student at
2009 Jul 15
4
Suggestions about the website of www.winehq.org
Hi, Dan, Thanks for your email. And please kindly see my comments below: Reply to topic 1):It is true that it is not leaglly suitable to copy MS fonts, however the open-source fonts (Wenquanyi, http://wenq.org/enindex.cgi) has been available for very long period, which also are probalble used by Fedora & Ubuntu as default Chinese fonts (even Asian/CJK fonts(maybe CKJ means "Chinese
2005 Jun 10
5
R 2.1.1 slated for June 20
The next version of R will be released (barring force majeure) on June 20th, with beta versions available starting Monday. Please do check them on your system *before* the release this time... Apologies for the late announcement, but my department moved this week and I needed to be sure that my set-up survived the move. -pd -- O__ ---- Peter Dalgaard ?ster
2007 Feb 07
2
My new record: Indexing 20 millions docs = 79m9.378s
Gentoo Linux 2.6 8 AMD Opteron 64-bit Processors 32GB Memory -------------------------------------------------------------------------------- Environment: ------------------ XAPIAN_FLUSH_THRESHOLD=21000000 XAPIAN_FLUSH_THRESHOLD_LENGTH=16000000 XAPIAN_PREFER_FLINT=True Indexing 20 million documents: --stemmer=none ------------------------------------------- real 79m9.378s user 77m28.696s
2016 Jul 29
3
Pull requests: CJK words and Snippet generator
Hi James, thanks for the feedback. On Thu, Jul 28, 2016, at 00:22, James Aylett wrote: > This sounds great! I know sufficiently little about CJK that I won't > try to comment on that at all :) I've just opened a pull request for the CJK tokenizer: https://github.com/xapian/xapian/pull/114 > I wonder if we can arrange suitable defaults to use your > implementation with the
2011 Oct 26
1
set different font family for strings in mtext or text?
Hi there, Is it possible to set different font family for strings in mtext or text? For example, on windows platform with windows() device: plot(1:10, type = "n") text(5,5, "Chinese (English)") #Chinese for Chinese characters it will give the correct Chinese and English characters with two different font family, i.e., English character in default sans family, and Chinese
2005 Dec 02
43
ANN: acts_as_ferret
Hi all This week I have worked with Rails and Ferret to test Ferrets (and Lucenes) capabilities. I decided to make a mixin for ActiveRecord as it seemed the simplest possible solution and I ended up making this into a plugin. For more info on Ferret see: http://ferret.davebalmain.com/trac/ The plugin is functional but could easily be refined. Anyway I want to share it with you. Regard it as a
2005 Dec 02
43
ANN: acts_as_ferret
Hi all This week I have worked with Rails and Ferret to test Ferrets (and Lucenes) capabilities. I decided to make a mixin for ActiveRecord as it seemed the simplest possible solution and I ended up making this into a plugin. For more info on Ferret see: http://ferret.davebalmain.com/trac/ The plugin is functional but could easily be refined. Anyway I want to share it with you. Regard it as a