Hello, Sorry if this was asked before, but I could not find anything in the archive. Has fontconfig the option to define fonts according to ISO 15924 (and not only according to ISO 639)? If not, I would propose this option: In my opinion, it is much more flexible than defining fonts according to a specific region (e.g. TW or CN). In some cases, it is even necessary, because the region does not differ. Do I understand this correctly, that the user can specify a font in the config file according to a specific language? I see this in Firefox (even though it does not seem to use fontconfig, but I guess an addon could be written to solve it), that I can specify fonts according to language (e.g. Chinese Traditional (Hongkong)) and the Browser selects the font if the html file includes the xml:lang attribute. But this is a bit inconvenient to do this in every application, so I guess fontconfig changes this globally? But I only saw in my config files options to generally define an order of font substition, but not according to language and script tags? Returning to my question: German Fraktur has the ISO 15924 tag ?Latf?, which is necessary to define that a paragraph should be displayed in a Fraktur style (because it uses the same code points as normal Latin characters). But there is no region to define, so I guess the correct tag would be ?de-Latf?. But there are no options anywhere to specify a font for this case. Another possibilities would be e.g. ?ja-Latn? for Japanese in Latin transcription. Is it maybe possible to implement this? I guess, a flexible way for this would be: Latn (generally): DejaVu Sans Latf (generally): Breitkopf Fraktur ja (generally): Kochi Gothic ja-Latn: DejaVu Sans ja-Hant: some font with old glyphs and so on. So I think a possible way would be to define a general rule for a language (according to ISO-639) or a script (ISO 15924) at first and then a specific rule for a language or script which would override the general rule. Thanks Gerrit
Gerrit Sangel wrote:> But I only saw in my config files options to generally define an order of font > substition, but not according to language and script tags?Hi Gerrit, fontconfig can substitute fonts by two-letter language code. The idea is to set up a match rule that matches lang="jp" for instance and then to substitute sans for your Japanese font for that language. Firefox does use fontconfig when it uses pango (which is the default). pat
On Sun, 2007-12-02 at 13:43 +0100, Gerrit Sangel wrote:> Hello, > > Sorry if this was asked before, but I could not find anything in the archive. > > Has fontconfig the option to define fonts according to ISO 15924 (and not only > according to ISO 639)?I chose to use ISO 639 because this tagging already existed in the HTML standard, and because I could readily find orthographies identifiable with specific ISO 639 languages. Adding support for the 15924 values seems like it would be easy to do in a compatible fashion; as those values do not conflict with either 639-1 or 639-2, we could simply add orthographies for the script codes and things should ''just work''.> In my opinion, it is much more flexible than defining fonts according to a > specific region (e.g. TW or CN). In some cases, it is even necessary, because > the region does not differ.Yeah, conflicts among multiple scripts used for the same langauge in the same territory do exist, which fontconfig doesn''t handle well at all.> Do I understand this correctly, that the user can specify a font in the config > file according to a specific language?You can match on the language and prepend a family name to make that preferred.> I see this in Firefox (even though it does not seem to use fontconfig, but I > guess an addon could be written to solve it)firefox does use fontconfig, although the language-based selection is internal, not based on modifying fontconfig matching rules.> So I think a possible way would be to define a general rule for a language > (according to ISO-639) or a script (ISO 15924) at first and then a specific > rule for a language or script which would override the general rule.The pattern matching and editing rules should be able to handle this without change, execpt for the addition of ISO 15924 script codes to the existing set of language/territory pairs. -- keith.packard at intel.com -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.freedesktop.org/archives/fontconfig/attachments/20071202/eb7e2dc5/attachment.pgp
On Sun, 2007-12-02 at 19:13 -0800, Keith Packard wrote:> On Sun, 2007-12-02 at 13:43 +0100, Gerrit Sangel wrote: > > Hello, > > > > Sorry if this was asked before, but I could not find anything in the archive. > > > > Has fontconfig the option to define fonts according to ISO 15924 (and not only > > according to ISO 639)? > > I chose to use ISO 639 because this tagging already existed in the HTML > standard, and because I could readily find orthographies identifiable > with specific ISO 639 languages. > > Adding support for the 15924 values seems like it would be easy to do in > a compatible fashion; as those values do not conflict with either 639-1 > or 639-2, we could simply add orthographies for the script codes and > things should ''just work''.I''ve thought about passing script knowledge from Pango to fontconfig before: http://bugzilla.gnome.org/show_bug.cgi?id=346043 It''s useful indeed, but I don''t think using scripts *instead* of language makes sense. What I imagine is useful is having the pattern element script=arabic. That can be matched for font tailoring. I also had Unicode scripts in mind, instead of ISO 15924, and I had a user-readable version in mind, like "arabic" and "latin". Pango already has that information and it can be deduced from standard Unicode script names. Doesn''t mean it can''t be ISO 15924 names though, but the mapping is not one to one, and I really don''t understand why Fraktur is a different script than Latin in there. I don''t think this feature if added should be used for things like Fraktur.> > In my opinion, it is much more flexible than defining fonts according to a > > specific region (e.g. TW or CN). In some cases, it is even necessary, because > > the region does not differ. > > Yeah, conflicts among multiple scripts used for the same langauge in the > same territory do exist, which fontconfig doesn''t handle well at all.If we add script tags in excess to language tags, orthographies then can be extended to tell what script is used in them. Matching can skip if script tags don''t match.> > Do I understand this correctly, that the user can specify a font in the config > > file according to a specific language? > > You can match on the language and prepend a family name to make that > preferred. > > > I see this in Firefox (even though it does not seem to use fontconfig, but I > > guess an addon could be written to solve it) > > firefox does use fontconfig, although the language-based selection is > internal, not based on modifying fontconfig matching rules. > > > So I think a possible way would be to define a general rule for a language > > (according to ISO-639) or a script (ISO 15924) at first and then a specific > > rule for a language or script which would override the general rule. > > The pattern matching and editing rules should be able to handle this > without change, execpt for the addition of ISO 15924 script codes to the > existing set of language/territory pairs.Another piece of information that can improve language matching is to use ISO 639-3 macrolanguage information. That can fontconfig for example that Dari is a Persian language for example: http://bugzilla.gnome.org/show_bug.cgi?id=470907 -- behdad http://behdad.org/ ...very few phenomena can pull someone out of Deep Hack Mode, with two noted exceptions: being struck by lightning, or worse, your *computer* being struck by lightning. -- Matt Welsh
Am Dienstag 04 Dezember 2007 schrieben Sie:> I also had Unicode scripts in mind, instead of ISO 15924, and I had a > user-readable version in mind, like "arabic" and "latin". Pango already > has that information and it can be deduced from standard Unicode script > names. Doesn''t mean it can''t be ISO 15924 names though, but the mapping > is not one to one, and I really don''t understand why Fraktur is a > different script than Latin in there. I don''t think this feature if > added should be used for things like Fraktur.Well, in my opinion the case with Fraktur is more or less the same as with the Han unification. Apart from the long s, Fraktur shares the same code points with normal latin, so it can?t really be guessed via the code points. It may only be different glyphs, but the appearance is (imho) way too different to just speak of a different style like serif or sans serif. They are used in a different way, as well. Foreign words are usually not written in Fraktur, so sometimes the script information has to be changed in the sentence. Doing this via CSS would work, but it is not really flexible. The first thing is, as far as I know, that there is no real ?standard? Fraktur font available, so the web designer could not just specify a certain font. He would have to specify several fonts in CSS, which I think would be a bit too much work. If he would just do it via a script tag, he could just define <p xml:lang="de-Latf">Das i?t Fraktur <span xml:lang="de">und das Antiqua</span></p> and let the user care about which font he wants to use. But what are the benefits of Unicode scripts? Is there a list available? As the Unicode website states, the Unicode Consortium was appointed to manage ISO 15924. So I would have guessed that this is the ?official? script list for Unicode.> > > In my opinion, it is much more flexible than defining fonts according > > > to a specific region (e.g. TW or CN). In some cases, it is even > > > necessary, because the region does not differ. > > > > Yeah, conflicts among multiple scripts used for the same langauge in the > > same territory do exist, which fontconfig doesn''t handle well at all. > > If we add script tags in excess to language tags, orthographies then can > be extended to tell what script is used in them. Matching can skip if > script tags don''t match.Well, but why should script tags don?t match? I would guess (I?m no linguist) that you can express every language with every script, even though it may not be quite correct most of the time. So I don?t think that there should be a limitation. I think the main purpose of the script tags is that a script can be specified for a language which is usually not written with that script. But the different iso standards would not conflict as far as I know. ISO 639 is written entirely in lowercase letters, ISO 3166 completely in uppercase and ISO 15924 has the first letter in uppercase, the other three in lowercase. And I guess the ordering would be from ?biggest? to ?lowest?, so language-region-script.> > > Do I understand this correctly, that the user can specify a font in the > > > config file according to a specific language? > > > > You can match on the language and prepend a family name to make that > > preferred. > > > > > I see this in Firefox (even though it does not seem to use fontconfig, > > > but I guess an addon could be written to solve it) > > > > firefox does use fontconfig, although the language-based selection is > > internal, not based on modifying fontconfig matching rules. > > > > > So I think a possible way would be to define a general rule for a > > > language (according to ISO-639) or a script (ISO 15924) at first and > > > then a specific rule for a language or script which would override the > > > general rule. > > > > The pattern matching and editing rules should be able to handle this > > without change, execpt for the addition of ISO 15924 script codes to the > > existing set of language/territory pairs. > > Another piece of information that can improve language matching is to > use ISO 639-3 macrolanguage information. That can fontconfig for > example that Dari is a Persian language for example: > > http://bugzilla.gnome.org/show_bug.cgi?id=470907Well, but this is for *languages*, not *scripts*. Another example would maybe this: I have a Japanese text I want to write in old characters in use before simplification after WW2. Although some old characters are encoded differently, some were unified because there are only minor stylistic differences. I would have to use a higher level protocol to define that these should be old characters. But the language itself does not differ. ISO 15924 has some tags for Han, namely Hani (Han ideographs), Hans (simplified Han), Hant (traditional Han). So I would define this old character as ?ja-Hant? and the browser could select a font which has these old glyphs. In this case, you could not differentiate between a language and a region, because it is the same as modern Japanese. *Only* the script differs. So I would really urge for ISO 15924. In my opinion, this is the best solution, because a) an established standard exists b) it is conform with ISO 639 and 3166 c) It is managed by the Unicode consortium d) Why reinvent the wheel? And I would not think, names like ?arabic? or ?latin?? are that useful. First, because they explicitely aim towards english speakers, which especially in this case, I don?t like that much. Second, because the ISO 15924 tags are derived from more or less user readable names, and because they have 4 letters, they are still quite well to read. Arabic is Arab and Latin is Latn. Third, if the web designer already has to look which language/country code he needs, I don?t think it would be very exhausting. http://www.unicode.org/iso15924/iso15924-en.html Gerrit