On Sat, 2003-07-12 at 01:00, Ambrose Li wrote:> With the current version of fontconfig (and gtk2), it is getting > difficult to get X applications to let me use, for example, a > Japanese font for traditional Chinese, even if the font is > perfectly fine for the task I do, because the application will > believe the fontconfig notion of "support" for my locale, and > filter out all the "unsupported" fonts. I think it > counter-productive to put so much trust in the mechanical notion > of "complete code space coverage".Can you give a concrete example? As Keith said, if you specify a font explicitly, you''ll get that font for every character it contains. Regards, Owen
On Sat, 2003-07-12 at 12:14, Keith Packard wrote:> Around 11 o''clock on Jul 12, Owen Taylor wrote: > > > when a key is referred > > to as having the value "foo,bar", there are three possible > > interpretations of that: > > > > * A string containing an embedded comma > > * A pattern with multiple values with the same key > > * A pattern with a single value with a composite type (LangSet) > > and the winner is 2) -- foo,bar represents a pattern with multiple values > for the same key. LangSets and Charsets were designed to be a more compact > representation of this idea for those specific kinds of values; I think > there are some places where the fact that they are stored in a single > entry are exposed to the user and I''d like to close those holes.What about in <match><test>? What does <string>times,courier</string> <lang>en,de</lang> Mean there? If it means "an embedded comma" then I would suggest that fontconfig should probably print a warning like: "Interpreting '','' as part of value" because otherwise, people will definitely get confused. (the docs say currently "These elements hold a single value of the indicated type.") Regards, Owen
On Sat, 2003-07-12 at 12:04, Keith Packard wrote:> Around 10 o''clock on Jul 12, Owen Taylor wrote: > > > Can you give a concrete example? As Keith said, if you specify > > a font explicitly, you''ll get that font for every character > > it contains. > > I thought the problem mentioned was that applications were using lang to > restrict the presented list of available fonts in some context. I know > Mozilla does this when selecting preferred fonts for language groups; I > can believe that other apps also do this; perhaps we should find a way to > deprecate this activity. The Mozilla behaviour was inherited from the > core font listing techniques and so is not specific to it''s interactions > with fontconfig.gtk2 was explicitely mentioned, and gtk2 doesn''t expose fontconfig''s listing system at all. But maybe "Mozilla using gtk2" was meant. Regards, Owen
On Fri, Jul 11, 2003 at 01:54:56PM -0700, Keith Packard wrote:> The font supports all of the langs requested by the > application. I think this means that the font ''contains'' > all of the langs requested by the application (remember, > we''re talking about LISTING here). Now, the tricky part of > defining what ''support'' means for a specific lang entry. When > the application provides a language/territory pair, then the > font must either provide a matching language/territory pair, > or a bare language entry. When the application provides > a bare language, the font must either provide a matching > bare language entry or a language/territory pair with *any* > territory: > > application font "supports" > ----------- ---- ---------- > zh zh_cn YES > zh_tw zh_cn NOThis is theoretically sound. However, for practical purposes it is wrong; fonts having incomplete coverages are generally still useful (not in general but for particular tasks like typesetting short pieces or even longer pieces of text). Especially with the scarcity of free CJK fonts, it is almost a must to, for example, use zh_CN or even ja/ko fonts for zh_TW in certain cases. (The reverse is also true; i.e., a zh_TW and/or zh_CN font will be useful for setting Japanese in a limited way.) In fact, there are even commercial zh_TW fonts that cover less than half of the Big5 code space (e.g., only the "frequently used characters" space, i.e., 4501 code points out of the complete Big5 coverage of 17552; because of the structure of Big5, just having 4501 of the "most frequently used" characters should at least already make the font "support zh_TW"). With the current version of fontconfig (and gtk2), it is getting difficult to get X applications to let me use, for example, a Japanese font for traditional Chinese, even if the font is perfectly fine for the task I do, because the application will believe the fontconfig notion of "support" for my locale, and filter out all the "unsupported" fonts. I think it counter-productive to put so much trust in the mechanical notion of "complete code space coverage". Regards, -- Ambrose LI Cheuk-Wing <a.c.li@ieee.org> http://ada.dhs.org/~acli/
Around 10 o''clock on Jul 12, Owen Taylor wrote:> Can you give a concrete example? As Keith said, if you specify > a font explicitly, you''ll get that font for every character > it contains.I thought the problem mentioned was that applications were using lang to restrict the presented list of available fonts in some context. I know Mozilla does this when selecting preferred fonts for language groups; I can believe that other apps also do this; perhaps we should find a way to deprecate this activity. The Mozilla behaviour was inherited from the core font listing techniques and so is not specific to it''s interactions with fontconfig. -keith
On Fri, 2003-07-11 at 16:54, Keith Packard wrote:> LISTING FONTS > > When listing fonts, contains should have "obvious" semantics, I suggest > that those semantics depend on the type of the value: > > string, number, boolean: > > font has an equal value for every value in the pattern. This means > that using ''times,courier'' for the family will result in no fonts > being listed as no font has both times and courier family names. In fact, I > can''t see a good use for multiple values here as it would require multiple > values in the fonts; let''s see if that is broken. For strings, the change > here is that ''contains'' does not mean sub string -- list ''courier'' and you > won''t see ''courier 10 pitch''. I think strings should be treated as atomic > values in this context; fontconfig doesn''t have string operators, which > is at least consistent.What you are saying in this mail generally makes sense, but when I get down to details I get a little confused, especially about the interpretation of multiple values - when a key is referred to as having the value "foo,bar", there are three possible interpretations of that: * A string containing an embedded comma * A pattern with multiple values with the same key * A pattern with a single value with a composite type (LangSet) When reading through your mail, I had some trouble figuring out when each of these interpretations was applicable in what context, and in fact, it''s not always clear to me in practice using fontconfig either. If I do: fc-list times,courier I assume that the resulting pattern has to FC_FAMILY elements, one for times, and one for courier. But then I don''t see how your proposed changes section:> 1) Use a Contains-alike operator for LISTING which does exact > matching for strings, permit Contains for EDITING to do > substring matchingIs going to result in going from the current result: List both fonts with a family of Times and those with a family of Courier to the behavior described above. Regards, Owen
Around 1 o''clock on Jul 12, Ambrose Li wrote:> With the current version of fontconfig (and gtk2), it is getting > difficult to get X applications to let me use, for example, a > Japanese font for traditional Chinese, even if the font is > perfectly fine for the task I do, because the application will > believe the fontconfig notion of "support" for my locale, and > filter out all the "unsupported" fonts.Perhaps this is not a problem with fontconfig, but rather with how applications interpret it''s interface in presenting fonts. Fontconfig always places application specified families higher in precedence than fonts selected strictly through language coverage concerns, so you should be able to specify any font family by name and have it work in whatever locale you are using. The notion of language support is designed precisely for the case where no specified font family is available on the system and a ''fall-back'' to available fonts is required; choosing one with ''support'' for the language ensures that multiple fonts won''t be needed. Font substitution is a hard problem, and this language coverage mechanism has made a positive change in many environments on the resulting presentation of documents.> I think it counter-productive to put so much trust in the mechanical notion > of "complete code space coverage".Perhaps we need to create better interfaces for applications to help clarify where language coverage is intended to be used. Suggestions on what should be done are welcome. -keith
Around 11 o''clock on Jul 12, Owen Taylor wrote:> when a key is referred > to as having the value "foo,bar", there are three possible > interpretations of that: > > * A string containing an embedded comma > * A pattern with multiple values with the same key > * A pattern with a single value with a composite type (LangSet)and the winner is 2) -- foo,bar represents a pattern with multiple values for the same key. LangSets and Charsets were designed to be a more compact representation of this idea for those specific kinds of values; I think there are some places where the fact that they are stored in a single entry are exposed to the user and I''d like to close those holes.> If I do: > > fc-list times,courier > > I assume that the resulting pattern has two FC_FAMILY elements, one > for times, and one for courier.yes, that''s correct -- commas separate multiple values with the same key.> But then I don''t see how your proposed changes section: > > > 1) Use a Contains-alike operator for LISTING which does exact > > matching for strings, permit Contains for EDITING to do > > substring matching > > (will result in a change ...) to the behavior described above.I think I missed a step -- LISTING will require matches for all values of each key, so $ fc-list times,courier will list only fonts with *both* family times and family courier (i.e. no fonts at all). Yes, this is useless, but I want to make sure the meaning of $ fc-list :lang=en,de means to list only fonts with *both* english and german support. Having different meanings for different keys seems like a really bad idea, worse than defining the behaviour of ''fc-list times,courier'' as useless. Thanks for reading through this stuff; I''m hoping to get a chance to write down a specification for the library semantics from this discussion. -keith
Around 14 o''clock on Jul 12, Owen Taylor wrote:> What about in <match><test>? What does > > <string>times,courier</string> > <lang>en,de</lang> > > Mean there? If it means "an embedded comma" then I would suggest > that fontconfig should probably print a warning like:Sigh. Yes, it means an embedded comma; only the string name parser (FcNameParse) splits things at punctuation. This is useful for ''-'' where <string>sans-serif</string> means the sans-serif family and not the sans family at size ''serif''. If you want to check for any of a list, you can have multiple values in the <test> case: <test name="family" qual=any> <string>times</string> <string>courier</string> </test> That will look for either ''times'' or ''courier''. Or, you can use: <test name="lang" qual=all> <string>en</string> <string>de</string> </test> to check for both en and de. I''d prefer to not emit warnings for reasonable syntax; I''m not sure how one would rewrite the values to avoid the warnings which seems pretty harsh. -keith
"Contains" matching issues. The contains operator is currently used in font listing and can be used in match/edit rules. LISTING FONTS When listing fonts, contains should have "obvious" semantics, I suggest that those semantics depend on the type of the value: string, number, boolean: font has an equal value for every value in the pattern. This means that using ''times,courier'' for the family will result in no fonts being listed as no font has both times and courier family names. In fact, I can''t see a good use for multiple values here as it would require multiple values in the fonts; let''s see if that is broken. For strings, the change here is that ''contains'' does not mean sub string -- list ''courier'' and you won''t see ''courier 10 pitch''. I think strings should be treated as atomic values in this context; fontconfig doesn''t have string operators, which is at least consistent. charset: font contains listed Unicode codepoints, in otherwords, the charset provided by the font ''contains'' all of the glyphs requested by the application. lang: (Remember that ''lang'' is a composite value consisting of a language value and a territory value. The list of lang values in a font is computed from Unicode coverage ranges based on orthographies. Except for Chinese, all of these coverage ranges are (currently) assocated only with a language and not a territory. Chinese is (currently) split into three territory groups (mainland China and Singapore, Hong Kong, Taiwan and Macau). So, most language comparisons will be done with a language/territory pair supplied by the application (often from the current locale) against fonts which know only languages and not territories. However, applications will also provide only languages at times to be matched against fonts which have languages and territories.) The font supports all of the langs requested by the application. I think this means that the font ''contains'' all of the langs requested by the application (remember, we''re talking about LISTING here). Now, the tricky part of defining what ''support'' means for a specific lang entry. When the application provides a language/territory pair, then the font must either provide a matching language/territory pair, or a bare language entry. When the application provides a bare language, the font must either provide a matching bare language entry or a language/territory pair with *any* territory: application font "supports" ----------- ---- ---------- zh zh_cn YES zh_tw zh_cn NO en_gb en YES en en YES MATCHING The LISTING algorithm is designed to sharply restrict the set of provided fonts; an empty list is often the result of overspecified patterns; that matches the expected usage of providing precise information to users about what actual fonts are available, rather than what font will be used when a specific pattern is matched. In contrast, MATCHING is designed to always provide a font, and in fact to provide a score measuring how accurate that match is so that the set of available fonts can be sorted by this metric and returned to the application. When matching fonts, we''re not using the boolean ''contains'' operators, but rather measuring distance from the pattern to the font (in CS terms, LISTING is a constraint satsifaction problem while MATCHING is an constraint optimization problem) string, boolean: Distance in these objects is measured with only two values -- matching and nonmatching -- matching strings or booleans have distance 0 while mismatching values have distance 1. number: Distance between two numbers is just the absolute value of thier difference (the obvious value). This is used for things like weight and slant, the numeric values for those constants was carefully chosen to prefer reasonable substitutions (italic and oblique and closer together than either is to roman). charset: Distance between two charsets is the count of characters requested by the pattern but not provided by the font. This means that a font which fully covers the requested characters has distance ''0''. lang: Distance has three values: 0: pattern and font have equal language/country, or pattern has only language and font has language with any country. 1: Pattern and font have equal language and different country (zh_CN vs zh_TW) 2: Pattern and font have different language EDITING The EDITING algorithm needs a method for matching patterns for each edit operation; this is another constraint satisfaction problem as the edit rules are either applied or not applied. Match rules in edit instructions can use many different operators to constrain pattern selection: eq not_eq less less_eq more more_eq contains not_contains Each of these opeators behave differently for each datatype. For datatypes which aren''t ordered, I''ve defined the ordered operators to always return false. string: I think these should be treated as unordered objects so that collation isn''t visible to the user. The remaining question is whether the ''contains'' operator should be used to detect sub-string presense. The LISTING operation above should not do this as the operator is not selectable, but allowing ''contains'' to do substring detection in an EDITING context means that LISTING won''t use Contains, but rather some Contains-like analog which is actuall Equal for strings. Hmm. Permitting Contains for EDITING would probably be useful, especially for FC_STYLE pattern elements. boolean, number: These have obvious semantics for all of the operators if contains/not_contains are allowed to be synonyms for eq/not_eq. charset, lang: I think the semantics described above for LISTING should apply here. PROPOSED CHANGES I believe the only changes necessary to implement these semantics are: 1) Use a Contains-alike operator for LISTING which does exact matching for strings, permit Contains for EDITING to do substring matching 2) Change lang Contains semantics to make ll_xx contain ll and ll contain ll_xx (currently, I believe ll_xx does not contain ll)