Displaying 13 results from an estimated 13 matches for "fullwidth".
2024 Jan 08
1
Possible bug using FLAG_WORD_BREAKS with fullwidth Unicode codepoints
...dcdccc97 100644
--- a/xapian-core/queryparser/word-breaker.cc
+++ b/xapian-core/queryparser/word-breaker.cc
@@ -102,8 +102,10 @@ is_unbroken_script(unsigned p)
0xF900 - 1, 0xFAFF,
// FE30..FE4F; CJK Compatibility Forms
0xFE30 - 1, 0xFE4F,
- // FF00..FFEF; Halfwidth and Fullwidth Forms
- 0xFF00 - 1, 0xFFEF,
+ // FF00..FF60: Fullwidth Numbers, Latin Characters, Punctuation
+ // FF61..FF64: Halfwidth Punctuation
+ 0xFF65 - 1, 0xFFDC, // Halfwidth Katakana and Hangul
+ // FFE0..FFEF; Fullwidth and Halfwidth Symbols
The fullwidth "????? ?????...
2024 Jan 07
1
Possible bug using FLAG_WORD_BREAKS with fullwidth Unicode codepoints
...parser/word-breaker.cc
index 8108523ccd53..4fabc23f4b56 100644
--- a/xapian-core/queryparser/word-breaker.cc
+++ b/xapian-core/queryparser/word-breaker.cc
@@ -103,7 +103,7 @@ is_unbroken_script(unsigned p)
// FE30..FE4F; CJK Compatibility Forms
0xFE30 - 1, 0xFE4F,
// FF00..FFEF; Halfwidth and Fullwidth Forms
- 0xFF00 - 1, 0xFFEF,
+ //0xFF00 - 1, 0xFFEF,
// 1AFF0..1AFFF; Kana Extended-B
// 1B000..1B0FF; Kana Supplement
// 1B100..1B12F; Kana Extended-A
If we're fixing it this way we should check this list for other
instances of this (and doing this would probably reveal if that
assumptio...
2024 Jan 09
1
Possible bug using FLAG_WORD_BREAKS with fullwidth Unicode codepoints
...d treats Latin characters,
> numbers, symbols and punctuation as broken script. There's a couple of
> unit tests that check for this.
Thanks, that looks good - now merged.
I think we probably should backport this to 1.4 - it's a behaviour
change, but limited to text containing these fullwidth latin characters
and the change fixes a bug. The awkward wrinkle is that you need to
reindex to get the full benefits of the fix.
Did you already check the other ranges for cased letters? I can but if
you have already there's not much point.
> The fullwidth "????? ??????" tests...
2024 Jan 04
1
Possible bug using FLAG_WORD_BREAKS with fullwidth Unicode codepoints
I think I found a bug in Xapian 1.5 when using FLAG_WORD_BREAKS for input that contains characters in Unicode Halfwidth and Fullwidth Forms (https://unicode.org/charts/PDF/UFF00.pdf).
Since I am undecided yet if and how to fix this in Xapian I haven't come up with a pull request. Because trac currently is offline, I could not file a bug. I hope it's OK to post my analysis here first, I'll be happy to follow up report...
2024 Jan 10
2
Possible bug using FLAG_WORD_BREAKS with fullwidth Unicode codepoints
...e:
> Thanks, that looks good - now merged.
Thanks!
> Did you already check the other ranges for cased letters? I can but if
> you have already there's not much point.
I did not. If you find time, that'd be great. Otherwise I can make room for it in the next days.
> > The fullwidth "????? ??????" tests suggests to me that
> > either Xapian should allow for Unicode normalization, or application
> > developers must take care of this before indexing.
>
> We currently leave it to the API user to normalise Unicode
> representation, though maybe we s...
2008 Nov 22
1
declaring constants in an Sweave / LaTeX document
List,
I would like to set a variable to hold, say, the size of my plots in a
Sweave document. i.e. something like the following in my '.Rnw' file:
==============================================================================
smallPlotSize = 4
<<fig1, echo=false, results=hide, height=smallPlotSize, width=smallPlotSize,
fig=true>>=
dat <- read.table("
2014 Jan 10
4
[PATCH] Add a minimal hive with "special" keys and values
...le/Load Hive...)
+
+- A subkey 'asdf_äöüß' was created in the root node
+ - An empty REG_STRING value 'asdf_äöüß' was created within that node.
+- A subkey 'weird™' was created in the root node.
+ - An empty REG_STRING value 'symbols $£₤₧€' (SMALL DOLLAR SIGN,
+ FULLWIDTH POUND SIGN, PESETA SIGN, EURO SIGN) was created within
+ that node.
+- A subkey 'zero\0key' with an REG_DWORD value 'zero\0val'
+ was created using the 'mkzero/mkzero.c'. (\0 = zero character)
+
+- Hilko Bengen 2014-01-10.
diff --git a/images/mkzero/Makefile b/images/mk...
2014 Jan 13
0
Re: [PATCH 1/7] Add a minimal hive with "special" keys and values
...A key 'zero\0key' containing a REG_DWORD value 'zero\0val' (\0 = zero
> + character)
> +- A key 'asdf_äöüß' containing a REG_DWORD value 'asdf_äöüß'
> +- A key 'weird™' containing a REG_DWORD value 'symbols $£₤₧€' (SMALL
> + DOLLAR SIGN, FULLWIDTH POUND SIGN, PESETA SIGN, EURO SIGN)
> +
> +- Hilko Bengen 2014-01-10.
> diff --git a/images/mkzero/Makefile b/images/mkzero/Makefile
> new file mode 100644
> index 0000000..affe52b
> --- /dev/null
> +++ b/images/mkzero/Makefile
> @@ -0,0 +1,9 @@
> +CROSS=i686-w64-mingw32-...
2014 Jan 10
14
[PATCH 1/7] Add a minimal hive with "special" keys and values
...ys and values:
+
+- A key 'zero\0key' containing a REG_DWORD value 'zero\0val' (\0 = zero
+ character)
+- A key 'asdf_äöüß' containing a REG_DWORD value 'asdf_äöüß'
+- A key 'weird™' containing a REG_DWORD value 'symbols $£₤₧€' (SMALL
+ DOLLAR SIGN, FULLWIDTH POUND SIGN, PESETA SIGN, EURO SIGN)
+
+- Hilko Bengen 2014-01-10.
diff --git a/images/mkzero/Makefile b/images/mkzero/Makefile
new file mode 100644
index 0000000..affe52b
--- /dev/null
+++ b/images/mkzero/Makefile
@@ -0,0 +1,9 @@
+CROSS=i686-w64-mingw32-
+CFLAGS=--std=c99
+all: mkzero.exe
+clean:
+...
2014 Jan 14
2
Re: [PATCH 1/7] Add a minimal hive with "special" keys and values
...' containing a REG_DWORD value 'zero\0val' (\0 = zero
> > + character)
> > +- A key 'asdf_äöüß' containing a REG_DWORD value 'asdf_äöüß'
> > +- A key 'weird™' containing a REG_DWORD value 'symbols $£₤₧€' (SMALL
> > + DOLLAR SIGN, FULLWIDTH POUND SIGN, PESETA SIGN, EURO SIGN)
> > +
> > +- Hilko Bengen 2014-01-10.
> > diff --git a/images/mkzero/Makefile b/images/mkzero/Makefile
> > new file mode 100644
> > index 0000000..affe52b
> > --- /dev/null
> > +++ b/images/mkzero/Makefile
> > @@ -0...
2014 Jan 08
5
hivex: Make node names and value names with embedded null characters accessible
On Windows, there exist at least two APIs for dealing with the
Registry: The Win32 API (RegCreateKeyA, RegCreateKeyW, etc.) works
with null-terminated ASCII or UTF-16 strings. The native API
(ZwCreateKey, etc.), on the other hand works with UTF-16 strings that
are stored as buffers+length and may contain null characters. Malware
authors have been relying on the Win32 API's inability to
2007 Jun 05
7
Chinese, Japanese, Korean Tokenizer.
Hi,
I am looking for Chinese Japanese and Korean tokenizer that could can
be use to tokenize terms for CJK languages. I am not very familiar
with these languages however I think that these languages contains one
or more words in one symbol which it make more difficult to tokenize
into searchable terms.
Lucene has CJK Tokenizer ... and I am looking around if there is some
open source that we
2008 May 02
0
Wine release 0.9.61
...un setup for Office 2003
9257 Day of Defeat (a Half-Life 1 mod) - Mouse & Graphic
9388 installer stuck for TRS 2006 Demo
9959 Make wine updates work even if the registry changed
10128 winecfg: not launching
10198 IE?s writing-mode:tb-rl (CJK-style vertical text layout) renders fullwidth characters rotated when it should not
10411 Synergy HL2 mod crashes in IHTMLWindow2_Release
10676 Sega rally 2 crashes on start
10984 sun jre 5 update 10 installer hangs in 0.9.52
11019 matlab r14 and r16 (7.0.4 and 7.3.0) and WriteItNow3.1.0s hang if X in 24bpp mode
11191 Chief Arch...