search for: fullwidth

Displaying 13 results from an estimated 13 matches for "fullwidth".

2024 Jan 08
1
Possible bug using FLAG_WORD_BREAKS with fullwidth Unicode codepoints
...dcdccc97 100644 --- a/xapian-core/queryparser/word-breaker.cc +++ b/xapian-core/queryparser/word-breaker.cc @@ -102,8 +102,10 @@ is_unbroken_script(unsigned p) 0xF900 - 1, 0xFAFF, // FE30..FE4F; CJK Compatibility Forms 0xFE30 - 1, 0xFE4F, - // FF00..FFEF; Halfwidth and Fullwidth Forms - 0xFF00 - 1, 0xFFEF, + // FF00..FF60: Fullwidth Numbers, Latin Characters, Punctuation + // FF61..FF64: Halfwidth Punctuation + 0xFF65 - 1, 0xFFDC, // Halfwidth Katakana and Hangul + // FFE0..FFEF; Fullwidth and Halfwidth Symbols The fullwidth "????? ?????...
2024 Jan 07
1
Possible bug using FLAG_WORD_BREAKS with fullwidth Unicode codepoints
...parser/word-breaker.cc index 8108523ccd53..4fabc23f4b56 100644 --- a/xapian-core/queryparser/word-breaker.cc +++ b/xapian-core/queryparser/word-breaker.cc @@ -103,7 +103,7 @@ is_unbroken_script(unsigned p) // FE30..FE4F; CJK Compatibility Forms 0xFE30 - 1, 0xFE4F, // FF00..FFEF; Halfwidth and Fullwidth Forms - 0xFF00 - 1, 0xFFEF, + //0xFF00 - 1, 0xFFEF, // 1AFF0..1AFFF; Kana Extended-B // 1B000..1B0FF; Kana Supplement // 1B100..1B12F; Kana Extended-A If we're fixing it this way we should check this list for other instances of this (and doing this would probably reveal if that assumptio...
2024 Jan 09
1
Possible bug using FLAG_WORD_BREAKS with fullwidth Unicode codepoints
...d treats Latin characters, > numbers, symbols and punctuation as broken script. There's a couple of > unit tests that check for this. Thanks, that looks good - now merged. I think we probably should backport this to 1.4 - it's a behaviour change, but limited to text containing these fullwidth latin characters and the change fixes a bug. The awkward wrinkle is that you need to reindex to get the full benefits of the fix. Did you already check the other ranges for cased letters? I can but if you have already there's not much point. > The fullwidth "????? ??????" tests...
2024 Jan 04
1
Possible bug using FLAG_WORD_BREAKS with fullwidth Unicode codepoints
I think I found a bug in Xapian 1.5 when using FLAG_WORD_BREAKS for input that contains characters in Unicode Halfwidth and Fullwidth Forms (https://unicode.org/charts/PDF/UFF00.pdf). Since I am undecided yet if and how to fix this in Xapian I haven't come up with a pull request. Because trac currently is offline, I could not file a bug. I hope it's OK to post my analysis here first, I'll be happy to follow up report...
2024 Jan 10
2
Possible bug using FLAG_WORD_BREAKS with fullwidth Unicode codepoints
...e: > Thanks, that looks good - now merged. Thanks! > Did you already check the other ranges for cased letters? I can but if > you have already there's not much point. I did not. If you find time, that'd be great. Otherwise I can make room for it in the next days. > > The fullwidth "????? ??????" tests suggests to me that > > either Xapian should allow for Unicode normalization, or application > > developers must take care of this before indexing. > > We currently leave it to the API user to normalise Unicode > representation, though maybe we s...
2008 Nov 22
1
declaring constants in an Sweave / LaTeX document
List, I would like to set a variable to hold, say, the size of my plots in a Sweave document. i.e. something like the following in my '.Rnw' file: ============================================================================== smallPlotSize = 4 <<fig1, echo=false, results=hide, height=smallPlotSize, width=smallPlotSize, fig=true>>= dat <- read.table("
2014 Jan 10
4
[PATCH] Add a minimal hive with "special" keys and values
...le/Load Hive...) + +- A subkey 'asdf_äöüß' was created in the root node + - An empty REG_STRING value 'asdf_äöüß' was created within that node. +- A subkey 'weird™' was created in the root node. + - An empty REG_STRING value 'symbols $£₤₧€' (SMALL DOLLAR SIGN, + FULLWIDTH POUND SIGN, PESETA SIGN, EURO SIGN) was created within + that node. +- A subkey 'zero\0key' with an REG_DWORD value 'zero\0val' + was created using the 'mkzero/mkzero.c'. (\0 = zero character) + +- Hilko Bengen 2014-01-10. diff --git a/images/mkzero/Makefile b/images/mk...
2014 Jan 13
0
Re: [PATCH 1/7] Add a minimal hive with "special" keys and values
...A key 'zero\0key' containing a REG_DWORD value 'zero\0val' (\0 = zero > + character) > +- A key 'asdf_äöüß' containing a REG_DWORD value 'asdf_äöüß' > +- A key 'weird™' containing a REG_DWORD value 'symbols $£₤₧€' (SMALL > + DOLLAR SIGN, FULLWIDTH POUND SIGN, PESETA SIGN, EURO SIGN) > + > +- Hilko Bengen 2014-01-10. > diff --git a/images/mkzero/Makefile b/images/mkzero/Makefile > new file mode 100644 > index 0000000..affe52b > --- /dev/null > +++ b/images/mkzero/Makefile > @@ -0,0 +1,9 @@ > +CROSS=i686-w64-mingw32-...
2014 Jan 10
14
[PATCH 1/7] Add a minimal hive with "special" keys and values
...ys and values: + +- A key 'zero\0key' containing a REG_DWORD value 'zero\0val' (\0 = zero + character) +- A key 'asdf_äöüß' containing a REG_DWORD value 'asdf_äöüß' +- A key 'weird™' containing a REG_DWORD value 'symbols $£₤₧€' (SMALL + DOLLAR SIGN, FULLWIDTH POUND SIGN, PESETA SIGN, EURO SIGN) + +- Hilko Bengen 2014-01-10. diff --git a/images/mkzero/Makefile b/images/mkzero/Makefile new file mode 100644 index 0000000..affe52b --- /dev/null +++ b/images/mkzero/Makefile @@ -0,0 +1,9 @@ +CROSS=i686-w64-mingw32- +CFLAGS=--std=c99 +all: mkzero.exe +clean: +...
2014 Jan 14
2
Re: [PATCH 1/7] Add a minimal hive with "special" keys and values
...' containing a REG_DWORD value 'zero\0val' (\0 = zero > > + character) > > +- A key 'asdf_äöüß' containing a REG_DWORD value 'asdf_äöüß' > > +- A key 'weird™' containing a REG_DWORD value 'symbols $£₤₧€' (SMALL > > + DOLLAR SIGN, FULLWIDTH POUND SIGN, PESETA SIGN, EURO SIGN) > > + > > +- Hilko Bengen 2014-01-10. > > diff --git a/images/mkzero/Makefile b/images/mkzero/Makefile > > new file mode 100644 > > index 0000000..affe52b > > --- /dev/null > > +++ b/images/mkzero/Makefile > > @@ -0...
2014 Jan 08
5
hivex: Make node names and value names with embedded null characters accessible
On Windows, there exist at least two APIs for dealing with the Registry: The Win32 API (RegCreateKeyA, RegCreateKeyW, etc.) works with null-terminated ASCII or UTF-16 strings. The native API (ZwCreateKey, etc.), on the other hand works with UTF-16 strings that are stored as buffers+length and may contain null characters. Malware authors have been relying on the Win32 API's inability to
2007 Jun 05
7
Chinese, Japanese, Korean Tokenizer.
Hi, I am looking for Chinese Japanese and Korean tokenizer that could can be use to tokenize terms for CJK languages. I am not very familiar with these languages however I think that these languages contains one or more words in one symbol which it make more difficult to tokenize into searchable terms. Lucene has CJK Tokenizer ... and I am looking around if there is some open source that we
2008 May 02
0
Wine release 0.9.61
...un setup for Office 2003 9257 Day of Defeat (a Half-Life 1 mod) - Mouse & Graphic 9388 installer stuck for TRS 2006 Demo 9959 Make wine updates work even if the registry changed 10128 winecfg: not launching 10198 IE?s writing-mode:tb-rl (CJK-style vertical text layout) renders fullwidth characters rotated when it should not 10411 Synergy HL2 mod crashes in IHTMLWindow2_Release 10676 Sega rally 2 crashes on start 10984 sun jre 5 update 10 installer hangs in 0.9.52 11019 matlab r14 and r16 (7.0.4 and 7.3.0) and WriteItNow3.1.0s hang if X in 24bpp mode 11191 Chief Arch...