Yuriy Kaminskiy
2009-Jun-24 15:42 UTC
[Fontconfig] [PATCH] Wrong encoding for TT_MS_ID_UCS_4
Hello! In ttf namelists TT_PLATFORM_MICROSOFT/TT_MS_ID_UCS_4 uses UTF-16BE encoding, not UCS4 (as can be implied from name); see also freetype-2.3.5/src/sfnt/sfobjs.c. I''ve noticed this problem with second (MS PGothic) and third faces (MS UI Gothic) of msgothic.ttc font (version 5.00) - japanese family name and style name garbled and familylang wrong. Attached patch should work with fontconfig versions from 2.3.95 to 2.6.99; tested on 2.6.0 and 2.4.2. -------------- next part -------------- A non-text attachment was scrubbed... Name: fontconfig-2.5.0-TT_MS_ID_UCS_4.patch Type: text/x-diff Size: 624 bytes Desc: not available Url : http://lists.freedesktop.org/archives/fontconfig/attachments/20090624/7917dd5e/attachment.patch
Yuriy Kaminskiy
2009-Jul-22 21:06 UTC
[Fontconfig] [PATCH] Wrong encoding for TT_MS_ID_UCS_4
On 24.06.2009 19:42, Yuriy Kaminskiy wrote:> In ttf namelists TT_PLATFORM_MICROSOFT/TT_MS_ID_UCS_4 uses UTF-16BE > encoding, not UCS4 (as can be implied from name);ping.
Behdad Esfahbod
2009-Jul-22 21:10 UTC
[Fontconfig] [PATCH] Wrong encoding for TT_MS_ID_UCS_4
On 07/22/2009 05:06 PM, Yuriy Kaminskiy wrote:> On 24.06.2009 19:42, Yuriy Kaminskiy wrote: >> In ttf namelists TT_PLATFORM_MICROSOFT/TT_MS_ID_UCS_4 uses UTF-16BE >> encoding, not UCS4 (as can be implied from name); > ping.Are you sure? This is what I see in the code: static const FcFtEncoding fcFtEncoding[] = { { TT_PLATFORM_APPLE_UNICODE,? TT_ENCODING_DONT_CARE,? "UCS-2BE" }, { TT_PLATFORM_MACINTOSH,? TT_MAC_ID_ROMAN,? "MACINTOSH" }, { TT_PLATFORM_MACINTOSH,? TT_MAC_ID_JAPANESE,? "SJIS" }, { TT_PLATFORM_MICROSOFT,? TT_MS_ID_UNICODE_CS,? "UTF-16BE" }, { TT_PLATFORM_MICROSOFT,? TT_MS_ID_SJIS,? ? "SJIS-WIN" }, { TT_PLATFORM_MICROSOFT,? TT_MS_ID_GB2312,? "GB2312" }, { TT_PLATFORM_MICROSOFT,? TT_MS_ID_BIG_5,?? "BIG-5" }, { TT_PLATFORM_MICROSOFT,? TT_MS_ID_WANSUNG,? "Wansung" }, { TT_PLATFORM_MICROSOFT,? TT_MS_ID_JOHAB,?? "Johab" }, { TT_PLATFORM_MICROSOFT,? TT_MS_ID_UCS_4,?? "UCS4" }, { TT_PLATFORM_ISO,? ? TT_ISO_ID_7BIT_ASCII,? "ASCII" }, { TT_PLATFORM_ISO,? ? TT_ISO_ID_10646,? "UCS-2BE" }, { TT_PLATFORM_ISO,? ? TT_ISO_ID_8859_1,? "ISO-8859-1" }, }; Been there since 2004. behdad
Yuriy Kaminskiy
2009-Jul-22 23:23 UTC
[Fontconfig] [PATCH] Wrong encoding for TT_MS_ID_UCS_4
On 23.07.2009 01:10, Behdad Esfahbod wrote:> On 07/22/2009 05:06 PM, Yuriy Kaminskiy wrote: >> On 24.06.2009 19:42, Yuriy Kaminskiy wrote: >>> In ttf namelists TT_PLATFORM_MICROSOFT/TT_MS_ID_UCS_4 uses UTF-16BE >>> encoding, not UCS4 (as can be implied from name); >> ping. > Are you sure? This is what I see in the code:[shurg] I did not checked any standards on this, but that''s what I have in practice (i.e. on real font; before my change it''s garbled, after - all ok); and what I see in freetype2 code. See original post for details. <http://permalink.gmane.org/gmane.comp.fonts.fontconfig/3193>> { TT_PLATFORM_MICROSOFT,? TT_MS_ID_UCS_4,?? "UCS4" }, > Been there since 2004.Yep. As I said in original post, `patch applies to fontconfig from 2.3.95 to 2.6.99'' (did not checked earlier/later versions). That''s just quite rarely used, and counter-intuitive, so no-one noticed.
Yuriy Kaminskiy
2009-Jul-22 23:30 UTC
[Fontconfig] [PATCH] Wrong encoding for TT_MS_ID_UCS_4
On 23.07.2009 03:23, Yuriy Kaminskiy wrote:> On 23.07.2009 01:10, Behdad Esfahbod wrote: >> On 07/22/2009 05:06 PM, Yuriy Kaminskiy wrote: >>> On 24.06.2009 19:42, Yuriy Kaminskiy wrote: >> Are you sure? This is what I see in the code: > [shurg] I did not checked any standards on this, but that''s what I have in > practice (i.e. on real font; before my change it''s garbled, after - all ok); and > what I see in freetype2 code.=== cut freetype-2.3.9/sfnt/sfobjs.c:239 === case TT_MS_ID_UCS_4: /* Apparently, if this value is found in a name table entry, it is */ /* documented as `full Unicode repertoire''. Experience with the */ /* MsGothic font shipped with Windows Vista shows that this really */ /* means UTF-16 encoded names (UCS-4 values are only used within */ /* charmaps). */ convert = tt_name_entry_ascii_from_utf16; === cut ===
Behdad Esfahbod
2009-Jul-25 20:38 UTC
[Fontconfig] [PATCH] Wrong encoding for TT_MS_ID_UCS_4
On 07/22/2009 07:23 PM, Yuriy Kaminskiy wrote:> On 23.07.2009 01:10, Behdad Esfahbod wrote: >> On 07/22/2009 05:06 PM, Yuriy Kaminskiy wrote: >>> On 24.06.2009 19:42, Yuriy Kaminskiy wrote: >>>> In ttf namelists TT_PLATFORM_MICROSOFT/TT_MS_ID_UCS_4 uses UTF-16BE >>>> encoding, not UCS4 (as can be implied from name); >>> ping. >> Are you sure? This is what I see in the code: > [shurg] I did not checked any standards on this, but that''s what I have in > practice (i.e. on real font; before my change it''s garbled, after - all ok); and > what I see in freetype2 code. See original post for details. > <http://permalink.gmane.org/gmane.comp.fonts.fontconfig/3193> >> { TT_PLATFORM_MICROSOFT,? TT_MS_ID_UCS_4,?? "UCS4" }, >> Been there since 2004. > Yep. As I said in original post, `patch applies to fontconfig from 2.3.95 to > 2.6.99'' (did not checked earlier/later versions). > That''s just quite rarely used, and counter-intuitive, so no-one noticed.Ah, ok, I thought you mean the other way around. Fixed. behdad