Behdad Esfahbod
2008-Jan-28 04:38 UTC
[Fontconfig] Improving Latin font selection for CJK locales
Hi, This keeps coming up again and again: CJK users want Pango to choose Latin fonts differently under a CJK locale than it does under a non-CJK locale. Making that work is currently impossible in Pango+fontconfig. The reason being that Pango passes a Latin "lang" to fontconfig for Latin runs, and fontconfig and font configurations have no way to differentiate the Latin in CJK locale from Latin in Latin locale cases. I''d like to propose adding a new element named "locale" that holds the original locale language. Fontconfig needs not know about this at all except that filling it in in FcDefaultSubstitute() like it does for "lang". Then users can write configuration that is sensitive to locale. Pango then can pass PangoContext language as "locale". PangoContext language defaults to the locale, so this is all consistent. I can do this all in Pango only, but given that I want to encourage CJK font developer/packagers to write such configuration for their fonts, would be nice to have it upstreamed. As an example, one would write: <match> <test name="lang"> <string>en</string> </test> <test name="locale"> <string>ja</string> </test> <edit name="family" mode="prepend" binding="same"> <string>SomeJapaneseFontWithGoodLatin</string> </edit> </match> It could be easier if we could match on scripts instead of languages, but that''s another issue. Keith, what do you think? -- behdad http://behdad.org/ "Those who would give up Essential Liberty to purchase a little Temporary Safety, deserve neither Liberty nor Safety." -- Benjamin Franklin, 1759
Ed Trager
2008-Jan-28 17:24 UTC
[Fontconfig] Improving Latin font selection for CJK locales
Hi, everyone, Behdad notes:> It could be easier if we could match on scripts instead of languages, > but that''s another issue.... and I agree completely : REQUIREMENT: EXPAND DEFINITION OF LOCALE TO INCLUDE OPTIONAL ISO-15924 SCRIPT CODE ============================================================================ First of all, the notion of "locale" needs to be re-defined as composed of *3* elements instead of *2* elements. Currently, locales are composed of just two elements: (1) A "language" code (ISO-639-1, -2 : "en", "ja", "zh", "th", etc.) and (2) A "region" code ("US", "CA", "FR", "TW", "HK", "SG", etc. ) This concept is incomplete. A THIRD ELEMENT, SCRIPT, NEEDS TO BE ADDED. Using four-letter ISO-15924 ( http://unicode.org/iso15924/iso15924-codes.html ) codes is the obvious answer: (3) "Script" code (ISO-15924 : "arab", "cyrl", "hans" (simplified Chinese), "hant" (traditional Chinese) Both "region" and "script" can be considered as "optional". So we could now enumerate locales such as: => "Fully Specified" locales with all three elements: az_AZ_latn az_AZ_cyrl az_IR_arab zh_HK_hans zh_HK_hant => Locales missing "region" would also be permissable (and I think this variant would be extremely useful and I think translators would perhaps favor the generality that this option provides in many real-life applications): az_latn az_arab az_cyrl zh_hans zh_hant => Locales missing "script" of course also permissable (this is the current "status quo"): Systems would have to have rules for the "default" script : az_AZ : defaults to "latn" (Latin became official in Azerbaijan in 1991 although uptake has been apparently slow) az_IR : defaults to "arab" zh_HK : defaults to "hant" zh_SG : defaults to "hans" => Locales missing both "region" and "script" are also permissable (again this does not differ from current "status quo"): ja : implies (defaults to) "ja_JP_jpan" th : implies (defaults to) "th_TH_thai" The CLDR community is one obvious place for discussions about this, and I apologize that I have not had the time to investigate how far discussions on this topic have gotten in CLDR or other relevant communities (like maybe Linux LSB folks?). Adding a four-letter script code to Locale is the obvious remedy. Perhaps the Pango and Fontconfig communities could take the lead in creating the minor changes in infrastructure needed to support this addition ? Let''s return to Behdad''s Japanese example for a minute. Recall that modern Japanese is, for all intents and purposes, really composed of four scripts ( Han, Katakana, Hiragana, Latin ). So, for a Japanese locale, perhaps I ought really be able to specify a different font set each and every one of those four scripts independently, if I so desire. Best Wishes -- Ed Trager On Jan 27, 2008 11:38 PM, Behdad Esfahbod <behdad at behdad.org> wrote:> Hi, > > This keeps coming up again and again: CJK users want Pango to choose > Latin fonts differently under a CJK locale than it does under a non-CJK > locale. > > Making that work is currently impossible in Pango+fontconfig. The > reason being that Pango passes a Latin "lang" to fontconfig for Latin > runs, and fontconfig and font configurations have no way to > differentiate the Latin in CJK locale from Latin in Latin locale cases. > > I''d like to propose adding a new element named "locale" that holds the > original locale language. Fontconfig needs not know about this at all > except that filling it in in FcDefaultSubstitute() like it does for > "lang". Then users can write configuration that is sensitive to locale. > > Pango then can pass PangoContext language as "locale". PangoContext > language defaults to the locale, so this is all consistent. > > I can do this all in Pango only, but given that I want to encourage CJK > font developer/packagers to write such configuration for their fonts, > would be nice to have it upstreamed. > > As an example, one would write: > > <match> > <test name="lang"> > <string>en</string> > </test> > <test name="locale"> > <string>ja</string> > </test> > <edit name="family" mode="prepend" binding="same"> > <string>SomeJapaneseFontWithGoodLatin</string> > </edit> > </match> > > It could be easier if we could match on scripts instead of languages, > but that''s another issue. > > Keith, what do you think? > > -- > behdad > http://behdad.org/ > > "Those who would give up Essential Liberty to purchase a little > Temporary Safety, deserve neither Liberty nor Safety." > -- Benjamin Franklin, 1759 > > _______________________________________________ > Fontconfig mailing list > Fontconfig at lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/fontconfig >
Behdad Esfahbod
2008-Jan-28 18:59 UTC
[Fontconfig] Improving Latin font selection for CJK locales
On Mon, 2008-01-28 at 12:24 -0500, Ed Trager wrote:> Hi, everyone, > > Behdad notes: > > > It could be easier if we could match on scripts instead of languages, > > but that''s another issue. > > ... and I agree completely : > > REQUIREMENT: EXPAND DEFINITION OF LOCALE TO INCLUDE OPTIONAL ISO-15924 > SCRIPT CODENo need to shout Ed.> ============================================================================> > First of all, the notion of "locale" needs to be re-defined as > composed of *3* elements instead > of *2* elements. > > Currently, locales are composed of just two elements: > > (1) A "language" code (ISO-639-1, -2 : "en", "ja", "zh", "th", etc.) > and (2) A "region" code ("US", "CA", "FR", "TW", "HK", "SG", etc. ) > > This concept is incomplete. A THIRD ELEMENT, SCRIPT, NEEDS TO BE ADDED. Using > four-letter ISO-15924 ( > http://unicode.org/iso15924/iso15924-codes.html ) codes is the obvious > answer: > > (3) "Script" code (ISO-15924 : "arab", "cyrl", "hans" > (simplified Chinese), "hant" (traditional Chinese)No, this is not what I meant. And this doesn''t solve the issues I want to solve. [...]> Adding a four-letter script code to Locale is the obvious remedy. > Perhaps the Pango and Fontconfig communities could take the lead in > creating the minor changes in infrastructure needed to support this > addition ?That''s what I''ve been wanting to do, but not in the format you suggest.> Let''s return to Behdad''s Japanese example for a minute. Recall that > modern Japanese is, for all intents and purposes, really composed of > four scripts ( Han, Katakana, Hiragana, Latin ). So, for a Japanese > locale, perhaps I ought really be able to specify a different font set > each and every one of those four scripts independently, if I so > desire.Yes, and with my proposed syntax of having a separate "script" element you can do that. So you have a pattern of :lang="ja":script="han " for example. behdad> Best Wishes -- Ed Trager > > > On Jan 27, 2008 11:38 PM, Behdad Esfahbod <behdad at behdad.org> wrote: > > Hi, > > > > This keeps coming up again and again: CJK users want Pango to choose > > Latin fonts differently under a CJK locale than it does under a > non-CJK > > locale. > > > > Making that work is currently impossible in Pango+fontconfig. The > > reason being that Pango passes a Latin "lang" to fontconfig for > Latin > > runs, and fontconfig and font configurations have no way to > > differentiate the Latin in CJK locale from Latin in Latin locale > cases. > > > > I''d like to propose adding a new element named "locale" that holds > the > > original locale language. Fontconfig needs not know about this at > all > > except that filling it in in FcDefaultSubstitute() like it does for > > "lang". Then users can write configuration that is sensitive to > locale. > > > > Pango then can pass PangoContext language as "locale". PangoContext > > language defaults to the locale, so this is all consistent. > > > > I can do this all in Pango only, but given that I want to encourage > CJK > > font developer/packagers to write such configuration for their > fonts, > > would be nice to have it upstreamed. > > > > As an example, one would write: > > > > <match> > > <test name="lang"> > > <string>en</string> > > </test> > > <test name="locale"> > > <string>ja</string> > > </test> > > <edit name="family" mode="prepend" binding="same"> > > > <string>SomeJapaneseFontWithGoodLatin</string> > > </edit> > > </match> > > > > It could be easier if we could match on scripts instead of > languages, > > but that''s another issue. > > > > Keith, what do you think? > > > > -- > > behdad > > http://behdad.org/ > > > > "Those who would give up Essential Liberty to purchase a little > > Temporary Safety, deserve neither Liberty nor Safety." > > -- Benjamin Franklin, 1759 > > > > _______________________________________________ > > Fontconfig mailing list > > Fontconfig at lists.freedesktop.org > > http://lists.freedesktop.org/mailman/listinfo/fontconfig > >-- behdad http://behdad.org/ "Those who would give up Essential Liberty to purchase a little Temporary Safety, deserve neither Liberty nor Safety." -- Benjamin Franklin, 1759
Keith Packard
2008-Jan-28 19:54 UTC
[Fontconfig] Improving Latin font selection for CJK locales
On Mon, 2008-01-28 at 12:24 -0500, Ed Trager wrote:> REQUIREMENT: EXPAND DEFINITION OF LOCALE TO INCLUDE OPTIONAL ISO-15924 > SCRIPT CODEHow is this script code to be extracted from font files? -- keith.packard at intel.com -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.freedesktop.org/archives/fontconfig/attachments/20080129/185eeab7/attachment.pgp
Behdad Esfahbod
2008-Jan-28 20:48 UTC
[Fontconfig] Improving Latin font selection for CJK locales
On Tue, 2008-01-29 at 06:54 +1100, Keith Packard wrote:> On Mon, 2008-01-28 at 12:24 -0500, Ed Trager wrote: > > > REQUIREMENT: EXPAND DEFINITION OF LOCALE TO INCLUDE OPTIONAL ISO-15924 > > SCRIPT CODE > > How is this script code to be extracted from font files?In my design it''s just a target="pattern" element, helping user choose fonts better. Apparently we have different ideas here. As for Ed''s request, I think that thing is already supported, at least in glibc. One can have locales like az_IR at latn for example. Not sure what Pango does with that, but it''s not hard to make it work. That probably is not valid RFC-3066 thought, so I guess pango strips @latn part out. -- behdad http://behdad.org/ "Those who would give up Essential Liberty to purchase a little Temporary Safety, deserve neither Liberty nor Safety." -- Benjamin Franklin, 1759
Behdad Esfahbod
2008-Jan-28 20:53 UTC
[Fontconfig] Improving Latin font selection for CJK locales
(Keith, you forgot Reply All?) On Tue, 2008-01-29 at 06:56 +1100, Keith Packard wrote:> On Mon, 2008-01-28 at 13:59 -0500, Behdad Esfahbod wrote: > > > Yes, and with my proposed syntax of having a separate "script" element > > you can do that. So you have a pattern of :lang="ja":script="han " for > > example. > > What standard do we reference for these ''script'' tags?I see three possibilities: - Human-readable names defined in Unicode. That''s what Pango uses. - ISO-15924. Problem with this one is that mapping from Unicode names is not unique. Some may say that''s an advantage though. - OpenType script tags. Not a good idea really.> And, what standard do we reference for the related sub-orthography?Not sure. The main reason I need it is not to refine charset selection, but to make it easier to match on all languages using a script. Fontconfig can add that info to its orth files such that it automatically fills script="arabic" if lang="fa", but that''s not necessary. That said, I''m not sure about this feature yet as it''s not trivial to make pango pass this info, because pango groups common characters with non-common characters, so if a run starts with a common characters, we need to choose a font but we don''t have a script yet. Anyway, main purpose of my mail was the "locale" element which can be used to solve real problems now. Thanks, -- behdad http://behdad.org/ "Those who would give up Essential Liberty to purchase a little Temporary Safety, deserve neither Liberty nor Safety." -- Benjamin Franklin, 1759
Gerrit Sangel
2008-Jan-28 22:22 UTC
[Fontconfig] Improving Latin font selection for CJK locales
Am Montag 28 Januar 2008 schrieb Ed Trager: Currently, locales are composed of just two elements:> > (1) A "language" code (ISO-639-1, -2 : "en", "ja", "zh", "th", etc.) > and (2) A "region" code ("US", "CA", "FR", "TW", "HK", "SG", etc. ) > > This concept is incomplete. A THIRD ELEMENT, SCRIPT, NEEDS TO BE ADDED. > Using four-letter ISO-15924 ( > http://unicode.org/iso15924/iso15924-codes.html ) codes is the obvious > answer: > > (3) "Script" code (ISO-15924 : "arab", "cyrl", "hans" > (simplified Chinese), "hant" (traditional Chinese) > > Both "region" and "script" can be considered as "optional". So we > could now enumerate locales such as:I think I suggested this some months before and I still strongly support this. It would also be necessary for a German locale in Fraktur writing (for which I am currently gathering information).> > => "Fully Specified" locales with all three elements: > > az_AZ_latn > az_AZ_cyrl > az_IR_arab > > zh_HK_hans > zh_HK_hant > > => Locales missing "region" would also be permissable (and I think > this variant would be extremely useful and I think translators would > perhaps favor the generality that this option provides in many > real-life applications):Also strongly support this. For de_Latf. But I would urge for the script code with the first letter capitalized, so it can be properly distinguished from the language or region code.> > az_latn > az_arab > az_cyrl > > zh_hans > zh_hant > > => Locales missing "script" of course also permissable (this is the > current "status quo"): Systems would have to have rules for the > "default" script : > > az_AZ : defaults to "latn" (Latin became official in > Azerbaijan in 1991 although uptake has been apparently slow) > az_IR : defaults to "arab" > > zh_HK : defaults to "hant" > zh_SG : defaults to "hans" > > => Locales missing both "region" and "script" are also permissable > (again this does not differ from current "status quo"): > > ja : implies (defaults to) "ja_JP_jpan" > th : implies (defaults to) "th_TH_thai" > > The CLDR community is one obvious place for discussions about this, > and I apologize that I have not had the time to investigate how far > discussions on this topic have gotten in CLDR or other relevant > communities (like maybe Linux LSB folks?). > > Adding a four-letter script code to Locale is the obvious remedy. > Perhaps the Pango and Fontconfig communities could take the lead in > creating the minor changes in infrastructure needed to support this > addition ?Another question, but I do not know, to which applications this may be of concern: For German Fraktur, the application would sometimes have to switch fonts in a message string for some foreign words or upper case abbreviations (maybe this is similar to the CJK-latin-font problem). So somehow the translation files would have to have a possibility to change the script and (maybe) language on the fly, similar to html (with <span xml:lang="de-Latf"> The problem with fraktur is, that it is unified with ordinary Latin, so the difference could only be distinguished via a optional parameter, providing the information which script is to be used. Gerrit Sangel
Ed Trager
2008-Jan-28 22:39 UTC
[Fontconfig] Improving Latin font selection for CJK locales
> > > > => Locales missing "region" would also be permissable (and I think > > this variant would be extremely useful and I think translators would > > perhaps favor the generality that this option provides in many > > real-life applications): > > Also strongly support this. For de_Latf. > > But I would urge for the script code with the first letter capitalized, so it > can be properly distinguished from the language or region code. >Request for Comments RFC-4646 / Best Current Practice (BCP-47) ( http://www.rfc-editor.org/rfc/bcp/bcp47.txt ) states: " o [ISO639-1] recommends that language codes be written in lowercase (''mn'' Mongolian). o [ISO3166-1] recommends that country codes be capitalized (''MN'' Mongolia). o [ISO15924] recommends that script codes use lowercase with the initial letter capitalized (''Cyrl'' Cyrillic). " and also: " Although case distinctions do not carry meaning in language tags, consistent formatting and presentation of the tags will aid users. The format of the tags and subtags in the registry is RECOMMENDED. In this format, all non-initial two-letter subtags are uppercase, all non-initial four-letter subtags are titlecase, and all other subtags are lowercase. " One more important thing is that the order of the subtags in BCP47 is languageCode-scriptCode-territory, i.e.: "fr-Latn-CA" So it would seem that Pango, FontConfig, and the whole Linux / Free Desktop in general will want to just follow RFC 4646 / BCP47 (if they are not already doing just that). (As for me, I should have known to read the RFC/BCP before writing my email, but "better late than never" is more or less the story of my life ... ) -- Best Wishes -- Ed Trager
Qianqian Fang
2008-Jan-29 06:40 UTC
[Fontconfig] Improving Latin font selection for CJK locales
hi Behdad the proposed method sounds quite interesting and useful. For font developers, I think it will add the power to fine-tune the font selections, particularly for massaging CJK fonts with Latin fonts. I would like to give my full support on this effort. In addition, I am not quite sure if this could be a possible remedy for the common-script contextual formating issue that we have discussed earlier (please forgive me if the connection is obvious) https://www.redhat.com/archives/fedora-fonts-list/2007-December/msg00013.html if they are connected, can you illustrate one scenario where this mechanism can help to constrain the fonts for Common Scripts? thank you Qianqian Behdad Esfahbod wrote:> Hi, > > This keeps coming up again and again: CJK users want Pango to choose > Latin fonts differently under a CJK locale than it does under a non-CJK > locale. > > Making that work is currently impossible in Pango+fontconfig. The > reason being that Pango passes a Latin "lang" to fontconfig for Latin > runs, and fontconfig and font configurations have no way to > differentiate the Latin in CJK locale from Latin in Latin locale cases. > > I''d like to propose adding a new element named "locale" that holds the > original locale language. Fontconfig needs not know about this at all > except that filling it in in FcDefaultSubstitute() like it does for > "lang". Then users can write configuration that is sensitive to locale. > > Pango then can pass PangoContext language as "locale". PangoContext > language defaults to the locale, so this is all consistent. > > I can do this all in Pango only, but given that I want to encourage CJK > font developer/packagers to write such configuration for their fonts, > would be nice to have it upstreamed. > > As an example, one would write: > > <match> > <test name="lang"> > <string>en</string> > </test> > <test name="locale"> > <string>ja</string> > </test> > <edit name="family" mode="prepend" binding="same"> > <string>SomeJapaneseFontWithGoodLatin</string> > </edit> > </match> > > It could be easier if we could match on scripts instead of languages, > but that''s another issue. > > Keith, what do you think? > >
Behdad Esfahbod
2008-Jan-29 06:53 UTC
[Fontconfig] Improving Latin font selection for CJK locales
On Mon, 2008-01-28 at 17:39 -0500, Ed Trager wrote:> > One more important thing is that the order of the subtags in BCP47 is > languageCode-scriptCode-territory, i.e.: > > "fr-Latn-CA" > > So it would seem that Pango, FontConfig, and the whole Linux / Free > Desktop > in general will want to just follow RFC 4646 / BCP47 (if they are not > already doing > just that).Both pango and fontconfig support arbitrary locale tags. They just don''t have any useful information for any but a select set of common ones. If anyone wants to see more added, they should first add it to glibc''s locale database, then request addition to fontconfig and Pango, providing the needed data. (Pango in fact generates one of its tables out of fontconfig''s data, the other one is mapping locale languages to OpenType LangSys tags). So, beating the wrong horse really. -- behdad http://behdad.org/ "Those who would give up Essential Liberty to purchase a little Temporary Safety, deserve neither Liberty nor Safety." -- Benjamin Franklin, 1759
Pat Suwalski
2008-Jan-29 07:20 UTC
[Fontconfig] Improving Latin font selection for CJK locales
Qianqian Fang wrote:> the proposed method sounds quite interesting and useful. > For font developers, I think it will add the power to fine-tune > the font selections, particularly for massaging CJK fonts with > Latin fonts. I would like to give my full support on this effort.We ran into this exact issue when working on the eeePC. In the en_US.UTF-8 locale, the latin font was a nicely-hinted DejaVu Sans, with properly bitmap-hinted Chinese characters as needed from other fonts. In the zh_TW.UTF-8 and zh_CN.UTF-8 locales, it would always use the latin characters from the fonts made for Chinese, which had a rather ugly bitmap-hinted serif font. Using some fontconfig XML hackery, we got it to work nice in most applications, but sometimes the ugly latin characters in the Chinese fonts still show up. --Pat
Behdad Esfahbod
2008-Jan-29 08:53 UTC
[Fontconfig] Improving Latin font selection for CJK locales
On Tue, 2008-01-29 at 02:20 -0500, Pat Suwalski wrote:> Qianqian Fang wrote: > > the proposed method sounds quite interesting and useful. > > For font developers, I think it will add the power to fine-tune > > the font selections, particularly for massaging CJK fonts with > > Latin fonts. I would like to give my full support on this effort. > > We ran into this exact issue when working on the eeePC. > > In the en_US.UTF-8 locale, the latin font was a nicely-hinted DejaVu > Sans, with properly bitmap-hinted Chinese characters as needed from > other fonts. > > In the zh_TW.UTF-8 and zh_CN.UTF-8 locales, it would always use the > latin characters from the fonts made for Chinese, which had a rather > ugly bitmap-hinted serif font. > > Using some fontconfig XML hackery, we got it to work nice in most > applications, but sometimes the ugly latin characters in the Chinese > fonts still show up.Hi Pat, Note that the request here is to allow such a behavior. Qianqian has been looking for a way to force Pango to use the bitmap Latin glyphs in the Chinese font for Latin. That''s what Pango currently can''t be instructed to do without changing the default Latin font for en_US locales too.> --Pat-- behdad http://behdad.org/ "Those who would give up Essential Liberty to purchase a little Temporary Safety, deserve neither Liberty nor Safety." -- Benjamin Franklin, 1759
Behdad Esfahbod
2008-Jan-29 08:55 UTC
[Fontconfig] Improving Latin font selection for CJK locales
On Tue, 2008-01-29 at 01:40 -0500, Qianqian Fang wrote:> hi Behdad > > the proposed method sounds quite interesting and useful. > For font developers, I think it will add the power to fine-tune > the font selections, particularly for massaging CJK fonts with > Latin fonts. I would like to give my full support on this effort. > > In addition, I am not quite sure if this could be a possible remedy > for the common-script contextual formating issue that we have > discussed earlier (please forgive me if the connection is obvious) > https://www.redhat.com/archives/fedora-fonts-list/2007-December/msg00013.html > > if they are connected, can you illustrate one scenario where > this mechanism can help to constrain the fonts for Common Scripts?Well, it doesn''t make much difference for Common, but will let you choose to use the same font for Latin that you use for Chinese. IIRC that''s what you initially wanted to achieve. Right? behdad> thank you > > Qianqian > > > Behdad Esfahbod wrote: > > Hi, > > > > This keeps coming up again and again: CJK users want Pango to choose > > Latin fonts differently under a CJK locale than it does under a non-CJK > > locale. > > > > Making that work is currently impossible in Pango+fontconfig. The > > reason being that Pango passes a Latin "lang" to fontconfig for Latin > > runs, and fontconfig and font configurations have no way to > > differentiate the Latin in CJK locale from Latin in Latin locale cases. > > > > I''d like to propose adding a new element named "locale" that holds the > > original locale language. Fontconfig needs not know about this at all > > except that filling it in in FcDefaultSubstitute() like it does for > > "lang". Then users can write configuration that is sensitive to locale. > > > > Pango then can pass PangoContext language as "locale". PangoContext > > language defaults to the locale, so this is all consistent. > > > > I can do this all in Pango only, but given that I want to encourage CJK > > font developer/packagers to write such configuration for their fonts, > > would be nice to have it upstreamed. > > > > As an example, one would write: > > > > <match> > > <test name="lang"> > > <string>en</string> > > </test> > > <test name="locale"> > > <string>ja</string> > > </test> > > <edit name="family" mode="prepend" binding="same"> > > <string>SomeJapaneseFontWithGoodLatin</string> > > </edit> > > </match> > > > > It could be easier if we could match on scripts instead of languages, > > but that''s another issue. > > > > Keith, what do you think? > > > > >-- behdad http://behdad.org/ "Those who would give up Essential Liberty to purchase a little Temporary Safety, deserve neither Liberty nor Safety." -- Benjamin Franklin, 1759
Pat Suwalski
2008-Jan-29 09:06 UTC
[Fontconfig] Improving Latin font selection for CJK locales
Behdad Esfahbod wrote:> Note that the request here is to allow such a behavior. Qianqian has > been looking for a way to force Pango to use the bitmap Latin glyphs in > the Chinese font for Latin. That''s what Pango currently can''t be > instructed to do without changing the default Latin font for en_US > locales too.Ah, so it''s opposite. :) I assume he wants the latin from the Chinese so that it is monospaced in the same fashion? Otherwise, the fonts made for latin text are typically far superior. I know what I''m about to say is totally silly, but I once managed to achieve what I wanted by deleting the latin characters from the Chinese font altogether. Then I found a slightly cleaner fontconfig way of doing it. But the fact that it behaves differently by locale is problematic when the config was specified. In my case, it would be ideal if it were always treated as in en_US. --Pat
Behdad Esfahbod
2008-Jan-29 09:35 UTC
[Fontconfig] Improving Latin font selection for CJK locales
On Tue, 2008-01-29 at 04:06 -0500, Pat Suwalski wrote:> Behdad Esfahbod wrote: > > Note that the request here is to allow such a behavior. Qianqian has > > been looking for a way to force Pango to use the bitmap Latin glyphs in > > the Chinese font for Latin. That''s what Pango currently can''t be > > instructed to do without changing the default Latin font for en_US > > locales too. > > Ah, so it''s opposite. :)Yes.> I assume he wants the latin from the Chinese so that it is monospaced in > the same fashion?That''s one reason.> Otherwise, the fonts made for latin text are typically far superior.This is quite objective it seems!> I know what I''m about to say is totally silly, but I once managed to > achieve what I wanted by deleting the latin characters from the Chinese > font altogether. Then I found a slightly cleaner fontconfig way of > doing it.Actually IMO if the Latin glyph are crappy, removing them is the single most correct solution.> But the fact that it behaves differently by locale is problematic when > the config was specified. In my case, it would be ideal if it were > always treated as in en_US. > > --Pat-- behdad http://behdad.org/ "Those who would give up Essential Liberty to purchase a little Temporary Safety, deserve neither Liberty nor Safety." -- Benjamin Franklin, 1759
Ed Trager
2008-Jan-29 13:47 UTC
[Fontconfig] Improving Latin font selection for CJK locales
I remember that once upon a time the whole issue of making it possible for FontConfig to blacklist glyphs from fonts was discussed. The idea was that it is easy and fast to write a few lines of XML into a FontConfig .conf file while it is slow and laborious to remove glyphs from a font. Perhaps now is the time to actually implement this functionality in FontConfig? One group of beneficiaries would be those who want to blacklist LGC glyphs in existing CJK fonts. A second group of beneficiaries would be those who wanted to use certain LGC fonts that are very nice except for a few characters which they don''t like, such as a badly-formed EURO sign or something like that. So they wanted to be able to blacklist just individual bad glyphs in otherwise "nice" (but probably somewhat older, less-maintained) fonts.> > > I know what I''m about to say is totally silly, but I once managed to > > achieve what I wanted by deleting the latin characters from the Chinese > > font altogether. Then I found a slightly cleaner fontconfig way of > > doing it. > > Actually IMO if the Latin glyph are crappy, removing them is the single > most correct solution. >
Qianqian Fang
2008-Jan-29 15:28 UTC
[Fontconfig] Improving Latin font selection for CJK locales
Behdad Esfahbod wrote:> Hi Pat, > > Note that the request here is to allow such a behavior. Qianqian has > been looking for a way to force Pango to use the bitmap Latin glyphs in > the Chinese font for Latin. That''s what Pango currently can''t be > instructed to do without changing the default Latin font for en_US > locales too. >while, not exactly. I think my purpose is almost the identical as Pat: 1) avoid using the Latin/Common glyphs from Chinese fonts, instead, using system preferred Latin fonts (Bitstream for exp.) 2) particularly, in mono environment, if the default font is Latin mono font (say Courier), do not use contextual propagation for the Common (digits etc) char. near Chinese text, because it will use Chinese fonts to render and mess up with the alignment with the Latin mono font. Even for sans/serif environments, to keep digits (and punctuations) as close as Latin is preferred. Same as the Pat, we achieved the first purpose using Fontconfig, that''s the whole point of the wqy-bitmap-fonts fontconfig file review, see http://www.redhat.com/archives/fedora-fonts-list/2007-December/msg00002.html the second point currently is not possible, because Pango labels the Common scripts (digits) near Chinese text as Chinese, and in fontconfig, we never know if it is a common-script or Chinese Hanzi. This caused porblems like this: https://www.redhat.com/archives/fedora-fonts-list/2007-December/pngsBGtUJxMgD.png Seems to me that the proposed methods will still assign lang=zh for Common scripts between Chinese Hanzi if locale=zh. So, it may still not likely that we can force to use smooth Latin fonts for Common via fontconfig, is my understanding correct?> >> --Pat >>
Pat Suwalski
2008-Jan-29 23:59 UTC
[Fontconfig] Improving Latin font selection for CJK locales
Behdad Esfahbod wrote:>> I know what I''m about to say is totally silly, but I once managed to >> achieve what I wanted by deleting the latin characters from the Chinese >> font altogether. Then I found a slightly cleaner fontconfig way of >> doing it. > > Actually IMO if the Latin glyph are crappy, removing them is the single > most correct solution.So, this works, but it would be nice if it could be specified to fontconfig to just ignore a unicode range. The problem with deleting characters is that some are in commercial fonts, etc. This is particularly true with the Asian fonts. Of course, in cases such as word processors where the font is explicitly selected, it is preferable to use the characters that go with it. --Pat
Behdad Esfahbod
2008-Jan-30 01:33 UTC
[Fontconfig] Improving Latin font selection for CJK locales
On Tue, 2008-01-29 at 18:59 -0500, Pat Suwalski wrote:> > So, this works, but it would be nice if it could be specified to > fontconfig to just ignore a unicode range.Sure, Pango totally uses fontconfig to determine if a font supports a character. So this can completely be fixed in fontconfig. Someone needs to go finish the patch and pass it through Keith I think. behdad> The problem with deleting characters is that some are in commercial > fonts, etc. This is particularly true with the Asian fonts. > > Of course, in cases such as word processors where the font is > explicitly > selected, it is preferable to use the characters that go with it. > > --Pat >-- behdad http://behdad.org/ "Those who would give up Essential Liberty to purchase a little Temporary Safety, deserve neither Liberty nor Safety." -- Benjamin Franklin, 1759
Ed Trager
2008-Jan-30 01:56 UTC
[Fontconfig] Improving Latin font selection for CJK locales
Hi, Qianqian, Latin digits are basically treated as "neutral" characters in a run of text -- I think that is pretty much "standard Unicode operating procedure" if you look at how the digits are categorized in UCD. I don''t know the internal details of how Pango itemizes a string of text, but using your "pngsBGtUJxMgD.png" as an example, we can see what is most likely occurring: First, it appears that Pango treats "1234A" as a run of "latn" text because of the presence of the letter "A" -- all characters preceding the "A" are "neutrals" which presumably don''t influence the itemizer, but of course the letter "A" tells the itemizer that the current run of text is Latin script. Then of course the "?" starts a new run of text which gets classified as Han ("hani" if using the ISO 15924 code) script -- and the following neutrals "123" remain a part of that 2nd text segment. The final "ABC" however causes the itemizer to break out a 3rd segment --and it is "latn". Pango presumably then talks to fontconfig to get the font assignments for each of the three segments. Behdad can confirm if this is in fact how the itemizer works or not. So fixing this kind of "bug" or "feature" may require changing how the itemizer works. For example, what if digits were not categorized as "neutrals" but were instead assigned their own category of "Latin Digits" ? Then a text itemizer could break out "latin digits" into separate segments. For a document with Latin script, maybe these "latin digit" segments eventually get merged back into the "latn" segments because it is not necessary to treat them any differently from how the "latn" segments are treated. But if the main script is not Latin, then there may be some advantage to treating "latin digits" segments separately. For example, it would allow your Chinese text to have latin digits rendered in DejaVu Sans because the "latin digits" segments could simply be treated as another special kind of "latn" segment. There might also be some benefit to doing this in Arabic texts since the "latin digits" and even the "Arabic digits" need to be rendered as runs of LTR text embedded in surrounding RTL text. Of course there may be other issues and cases which I have not thought of yet, but this is not the first time that I have thought about treating segments of "latin digits" as some non-neutral category for the purposes of enhanced itemization. (I am actually currently working on writing some C++ UnicodeText classes of my own -- and just recently was playing around with these issues of text itemization, so I am very interested to learn what people *really* want to have). Is it possible that what people really want may *differ* in some details from the status-quo standard Unicode practices? Best Wishes - Ed> > the second point currently is not possible, because Pango labels the Common > scripts (digits) near Chinese text as Chinese, and in fontconfig, we never > know if it is a common-script or Chinese Hanzi. This caused porblems > like this: > > https://www.redhat.com/archives/fedora-fonts-list/2007-December/pngsBGtUJxMgD.png > > Seems to me that the proposed methods will still assign lang=zh for Common > scripts between Chinese Hanzi if locale=zh. So, it may still not likely > that we can force to use smooth Latin fonts for Common via fontconfig, > is my understanding correct? > > > > > >> --Pat > > >> > > _______________________________________________ > Fontconfig mailing list > Fontconfig at lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/fontconfig >
Behdad Esfahbod
2008-Jan-30 02:06 UTC
[Fontconfig] Improving Latin font selection for CJK locales
Hi Ed, If you are interested in the details, read the entire thread here: http://mail.gnome.org/archives/gtk-i18n-list/2007-December/thread.html I''m trying to avoid repeating the same reasoning again and again, and it''s really not quite on topic on fontconfig list anyway. behdad On Tue, 2008-01-29 at 20:56 -0500, Ed Trager wrote:> Hi, Qianqian, > > Latin digits are basically treated as "neutral" characters in a run of > text -- I think that is pretty much > "standard Unicode operating procedure" if you look at how the digits > are categorized in UCD. > > I don''t know the internal details of how Pango itemizes a string of > text, but using > your "pngsBGtUJxMgD.png" as an example, we can see what is most likely > occurring: First, it appears that Pango treats "1234A" as a run of "latn" text > because of the presence of the letter "A" -- all characters > preceding the "A" are "neutrals" which presumably don''t influence the > itemizer, but of course > the letter "A" tells the itemizer that the current run of text is Latin script. > Then of course the "?" starts a new run of text which gets classified as Han > ("hani" if using the ISO 15924 code) script -- and the following > neutrals "123" remain a part of that > 2nd text segment. The final "ABC" however causes the itemizer to break > out a 3rd segment --and it is "latn". > > Pango presumably then talks to fontconfig to get the font assignments > for each of the three segments. > Behdad can confirm if this is in fact how the itemizer works or not. > > So fixing this kind of "bug" or "feature" may require changing how the > itemizer works. > For example, what if digits were not categorized as "neutrals" but > were instead assigned their own > category of "Latin Digits" ? > > Then a text itemizer could break out "latin digits" into separate segments. > > For a document with Latin script, maybe these "latin digit" segments > eventually get merged back into > the "latn" segments because it is not necessary to treat them any > differently from how the "latn" segments > are treated. > > But if the main script is not Latin, then there may be some advantage > to treating "latin digits" segments separately. > > For example, it would allow your Chinese text to have latin digits > rendered in DejaVu Sans because the "latin digits" segments could > simply be treated as another special kind of "latn" segment. > > There might also be some benefit to doing this in Arabic texts since > the "latin digits" and even the "Arabic digits" need to be rendered as > runs of LTR text embedded in surrounding RTL text. > > Of course there may be other issues and cases which I have not thought > of yet, but this is not the first time that I have thought about > treating segments of "latin digits" as some non-neutral category for > the purposes of enhanced itemization. > > (I am actually currently working on writing some C++ UnicodeText > classes of my own -- and just recently was playing around with these > issues of text itemization, so I am very interested to learn what > people *really* want to have). Is it possible that what people really > want may *differ* in some details from the status-quo standard Unicode > practices? > > Best Wishes - Ed > > > > > the second point currently is not possible, because Pango labels the Common > > scripts (digits) near Chinese text as Chinese, and in fontconfig, we never > > know if it is a common-script or Chinese Hanzi. This caused porblems > > like this: > > > > https://www.redhat.com/archives/fedora-fonts-list/2007-December/pngsBGtUJxMgD.png > > > > Seems to me that the proposed methods will still assign lang=zh for Common > > scripts between Chinese Hanzi if locale=zh. So, it may still not likely > > that we can force to use smooth Latin fonts for Common via fontconfig, > > is my understanding correct? > > > > > > > > > >> --Pat > > > > >> > > > > _______________________________________________ > > Fontconfig mailing list > > Fontconfig at lists.freedesktop.org > > http://lists.freedesktop.org/mailman/listinfo/fontconfig > >-- behdad http://behdad.org/ "Those who would give up Essential Liberty to purchase a little Temporary Safety, deserve neither Liberty nor Safety." -- Benjamin Franklin, 1759
Qianqian Fang
2008-Jan-30 02:40 UTC
[Fontconfig] Improving Latin font selection for CJK locales
sorry, it really wasn''t my intension to derailing a discussion for implementing a useful feature. All I want to know is whether this is related to the problems we have discussed before. I guess the answer is no. Please continue with the lang/locale tag discussion and looking forward to seeing this feature implemented in pango and fontconfig. Behdad Esfahbod wrote:> Hi Ed, > > If you are interested in the details, read the entire thread here: > > http://mail.gnome.org/archives/gtk-i18n-list/2007-December/thread.html > > I''m trying to avoid repeating the same reasoning again and again, and > it''s really not quite on topic on fontconfig list anyway. > > behdad >