Displaying 20 results from an estimated 700 matches similar to: "Chinese segmentation"
2007 Jun 05
7
Chinese, Japanese, Korean Tokenizer.
Hi,
I am looking for Chinese Japanese and Korean tokenizer that could can
be use to tokenize terms for CJK languages. I am not very familiar
with these languages however I think that these languages contains one
or more words in one symbol which it make more difficult to tokenize
into searchable terms.
Lucene has CJK Tokenizer ... and I am looking around if there is some
open source that we
2011 Aug 13
3
Japanese and Korean Fonts inside Wine.
I installed Ubuntu in English.
How now, "Wine" to force to show the Korean (Hangeul) and Japanese (Hiragana/Katakana/kanji) fonts?.
2024 Jan 07
1
Possible bug using FLAG_WORD_BREAKS with fullwidth Unicode codepoints
On Thu, Jan 04, 2024 at 05:50:22PM +0100, Robert Stepanek wrote:
> Since I am undecided yet if and how to fix this in Xapian I haven't
> come up with a pull request. Because trac currently is offline, I
> could not file a bug. I hope it's OK to post my analysis here first,
> I'll be happy to follow up reporting that bug proper later (should we
> conclude that it actually
2024 Jan 08
1
Possible bug using FLAG_WORD_BREAKS with fullwidth Unicode codepoints
On Sun, Jan 7, 2024, at 7:45 PM, Olly Betts wrote:
> I've restarted trac.
I now created a pull request: https://github.com/xapian/xapian/pull/329 Should I create a trac issue, too?
> Assuming the latter is valid, just removing this block (or removing the
> parts of it which are Lu or Ll) should fix the problem as then
> tokenisation will switch mode - I tried this and it fixes
2009 Jul 12
3
Installing mysql with macports
Not sure if this is off topic, but there doesn''t seem to be an obvious
place to ask this question.
I am trying to use MacPorts to install mysql. I have xcode 3.0 and
x11 XQuartz 2.1.6 installed.
$PATH:
/opt/local/bin:/opt/local/sbin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/
local/bin:/usr/X11/bi
When I sudo port install mysql5-server
I receive:
---> Configuring mysql5
Error: Target
2003 Nov 04
2
4-STABLE b0rked in share/locale/zh_CN.GBK
Murray,
Your commits earlier this evening to zh_CN.GB18030 fixed that -STABLE
breakage, but zh_CN.GBK appears still to be missing, which causes
'make installworld' to fail. Can you please fix this as well?
install -m 644 -o root -g wheel uk_UA.KOI8-U.out /usr/share/locale/uk_UA.KOI8-U/LC_CTYPE
install -m 644 -o root -g wheel zh_CN.eucCN.out /usr/share/locale/zh_CN.eucCN/LC_CTYPE
2005 Aug 13
9
Multilingual Rails v0.6
Multilingual Rails v0.6 is released!
Here is the changelog. Documentation and download at the homepage:
http://www.tuxsoft.se/oss/rails/multilingual
v0.6 - 2005-08-13
* String case-manipulation functions replaced with ruby-unicode
equivalents (if ruby-unicode is installed):
String#downcase, String#upcase and String#capitalize now fully
handle Unicode.
* String normalization
2016 Jul 26
2
Pull requests: CJK words and Snippet generator
Hi,
The Cyrus IMAP mail server uses Xapian as search engine. Recently,
FastMail has sponsored implementation of two Xapian features: CJK word
splitting and a generator for search snippets. I've been working on both
features and we would be happy to get them merged into Xapian master.
The CJK word tokenizer uses the word segmentation algorithms of the
International Components for Unicode
2003 May 21
6
fixme:font:LFD_InitFontInfo DBCS fonts like...
I get these errors when running winex on paltalk.exe, what can I do?
Building font metrics. This may take some time...
fixme:font:LFD_InitFontInfo DBCS fonts like '-default-kai-medium-r-normal--8-80-72-72-c-80-big5-0' are not working correctly now.
fixme:font:LFD_InitFontInfo DBCS fonts like '-default-kai-medium-r-normal--8-80-72-72-c-80-gb2312.1980-0' are not working correctly
2007 Jun 29
3
[PATCH] Fix keymap for Japanese keyboard
Hi All,
We tested with Japanese keyboard.
Then, the local keys that was not able to be input with a Japanese keyboard was
found.
This patch added the key that was not able to be input to the keymap.
The key that cannot be input is as follows.
・Katakana
・Eisu_Toggle
Signed-off-by: Takanori Kasai <kasai.takanori@jp.fujitsu.com>
Signed-off-by: Junko Ichino
2019 Mar 09
2
Ask for advice on exact requirements to fix #699 mixed CJK numbers
Thanks for your patience.
I'm still confused of what I should do next.
If it's not worth changing anything here as it's a rare case,
sorry for my PR to github before the reply,
maybe you need to close it on github.
For another case, should I optimize current code with
replacing set to a static array?
Or rollback current modification to cjk-tokenizer and
try to do some work with the
2016 Jul 29
3
Pull requests: CJK words and Snippet generator
Hi James,
thanks for the feedback.
On Thu, Jul 28, 2016, at 00:22, James Aylett wrote:
> This sounds great! I know sufficiently little about CJK that I won't
> try to comment on that at all :)
I've just opened a pull request for the CJK tokenizer:
https://github.com/xapian/xapian/pull/114
> I wonder if we can arrange suitable defaults to use your
> implementation with the
2019 Mar 07
3
Ask for advice on exact requirements to fix #699 mixed CJK numbers
I am working on "#699 Better tokenisation of mixed CJK numbers",
and have implemented a partial patch of Chinese for this ticket.
Current code works well with special test cases and
all tests in xapian-core could still pass.
But I'm confused with exact requirements of the question,
for how much we could pay with performance on enabling more cases,
and if there are better methods to
2017 Aug 02
2
fcitx-anthy request (for Japanese users)
On Wed, Aug 02, 2017 at 09:37:06AM -0400, H wrote:
> >
> >Ah, also, in my .xinitrc (I boot into text mode then run startx I have,
> >above the line calling the window manager
> >
> >export LC_CTYPE=en_US.UTF-8
> >
> >Do you have fcitx-gtk2 and fcitx-gtk3 installed?
> >I repeat, I'm not an expert on this, my skill is in googling and finding
>
2017 Aug 03
2
fcitx-anthy request (for Japanese users)
On Wed, Aug 02, 2017 at 08:15:58PM -0400, H wrote:
> On 08/02/2017 09:46 AM, Scott Robbins wrote:
> > On Wed, Aug 02, 2017 at 09:37:06AM -0400, H wrote:
> > > > Ah, also, in my .xinitrc (I boot into text mode then run startx I have,
> > > > above the line calling the window manager
> > > >
> > > > export LC_CTYPE=en_US.UTF-8
> >
2016 Sep 07
2
Pull requests: CJK words and Snippet generator
On Tue, Sep 6, 2016, at 09:16, Olly Betts wrote:
> I think my main concerns are about efficiency (since that a major
> motivation for the current implementation, so slowing it down would be
> annoying), and whether we can just make this the standard behaviour
> rather than adding an option.
The current implementation is O(n) and I took care to keep it at that.
For the proposed term
2001 Jul 07
2
font metrics updated on each program launch
Everytime I start a program with wine-20010629 on Red Hat Linux, font
metrics get updated.
I am getting fonts through the Xserver (no xfont server)
(currently M$, ghostscript, Type1, misc and 75dpi fonts)
Is this behaviour normal?
Everytime this happens, I get lots of output like
fixme:font:LFD_InitFontInfo DBCS fonts like '-urw-century schoolbook
2009 Dec 10
1
ActionMail Charset for email body
Hi Guys,
I succeeded in sending email with content-type big5. However, the
content of the email is still utf-8!
I''ve tried to call the
t(:my_sentence).encode(''big5'')
or my content, but there is
"invalid byte sequence in UTF-8"
errors in a lot of places starting from actionmailer/lib/utils.rb
(text.to_s.gsub(/\r\n?/, "\n"))
and it stop me from
2010 Jun 29
5
More than two font in a plot
Hi there,
I am a Chinese R user. I hope to display Chinese character in a plot,
and than save it in PostScript format. I have read the article titled
"Non-Standard Fonts in PostScript and PDF Graphics", especially the
section about CJK fonts. I also tried the code:
> pdf("chinese.pdf", width=3, height=1)
> grid.text("\u4F60\u597D", y=2/3,
2019 Sep 12
2
Fw: Btrfs Samba and Quotas
Hello Hendrik
Can you help input 2 commands 'mount' and 'df -TPh' on OMV,
and post the output to us, thank you.
--
Regards,
Jones Syue | ???
QNAP Systems, Inc.