Keith Packard
2006-Aug-27 18:00 UTC
[Fontconfig] FcConfigNormalizeFontDir considered harmful
In looking at this code, it''s reading every directory in the font hierarchy and stat''ing every file. That''s a lot of what the font caches are supposed to avoid; all of those syscalls at application startup. I''m trying to figure out what we''re attempting here; the old cache code would already infinite loop if the font directories had circular symlinks, so this isn''t necessary from a robustness perspective. If the only thing the code is for is to eliminate duplicate caches for the same directory reached by multiple paths, then I don''t think that''s necessary either; again, the old code would have loaded the caches twice in that case. I think we can just eliminate the path normalization entirely. Arguments? -- keith.packard@intel.com -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.freedesktop.org/archives/fontconfig/attachments/20060827/5dd8deef/attachment.pgp
Patrick Lam
2006-Aug-27 21:34 UTC
[Fontconfig] FcConfigNormalizeFontDir considered harmful
Keith Packard wrote:> In looking at this code, it''s reading every directory in the font > hierarchy and stat''ing every file. That''s a lot of what the font caches > are supposed to avoid; all of those syscalls at application startup. > > I''m trying to figure out what we''re attempting here; the old cache code > would already infinite loop if the font directories had circular > symlinks, so this isn''t necessary from a robustness perspective. > > If the only thing the code is for is to eliminate duplicate caches for > the same directory reached by multiple paths, then I don''t think that''s > necessary either; again, the old code would have loaded the caches twice > in that case. > > I think we can just eliminate the path normalization entirely. > Arguments?I think that path normalization needs to run at least on strings that are input to fc-cache. For instance, `fc-cache .'' is going to create a cache file which claims to be for the directory .; not quite what we want. Normalization definitely avoids infinite loops in the presence of circular symlinks, which is nice behaviour, but doesn''t justify scanning all cache dirs on all fontconfig clients. See below in the discussion of FcCacheReadDirs. However, the original motivation for FcConfigNormalizeFontDir was the multiple path problem. Not loading caches multiple times, but at least having incorrect names for directories and then putting them in the global cache. I''m not sure that we need to run FcConfigNormalizeFontDir on startup to avoid that. Since the global cache goes away, then all of the NormalizeFontDir uses in global cache code go away anyway. That leaves calls in FcDirCacheUnlink, FcCacheReadDirs, FcDirCacheWrite, and FcDirScanConfig (aside from the call in fc-cache/fc-cache.c). There was some trickiness in calls to FcDirCacheUnlink due to multi-arch cache files. Separating cache files on a per-arch basis will make that problem go away; after we separate them, then we can preemptively blow away cache files and nothing bad will happen. The call in FcCacheReadDirs seems a bit strange. One might want to normalize before calling FcConfigAcceptFilename to make sure that the masks do the right thing (they didn''t in 2.3.2), because we can require that masks must be on canonical filenames (as stated in the <dir> entries). But that''s not what happens right now. And otherwise, the arguments to FcCacheReadDirs should be, by definition, normalized, since they come from the <dir> entries in the config file. So the call there just seems to prevent infinite loops, as per bug 2667 (and 4754). FcDirCacheWrite is only called from fc-cache. It''s probably redundant due to the call in fc-cache itself. Now FcDirScanConfig might create global caches with incorrect names if the paths are not normalized. I haven''t analyzed that case yet, but it seems like it could be potentially bad. pat
Keith Packard
2006-Aug-27 22:45 UTC
[Fontconfig] FcConfigNormalizeFontDir considered harmful
On Mon, 2006-08-28 at 00:34 -0400, Patrick Lam wrote:> I think that path normalization needs to run at least on strings that > are input to fc-cache. For instance, `fc-cache .'' is going to create a > cache file which claims to be for the directory .; not quite what we want.Ok, so paths not starting with ''/'' should be cleaned up somehow, probably by using something similar to the path normalization code, but it needn''t keep any state around (just rescan for each normalized path). That''s just for directory names that come from the user though, and there aren''t very many places where we have those aside from fc-cache.> However, the original motivation for FcConfigNormalizeFontDir was the > multiple path problem. Not loading caches multiple times, but at least > having incorrect names for directories and then putting them in the > global cache. I''m not sure that we need to run FcConfigNormalizeFontDir > on startup to avoid that.Yes, if you want to find the same cache file through multiple path names, you would have to normalize on app startup. I''m not sure we care that much though; a typical system doesn''t generally produce multiple font paths to the same font files.> Since the global cache goes away, then all of the NormalizeFontDir uses > in global cache code go away anyway. That leaves calls in > FcDirCacheUnlink, FcCacheReadDirs, FcDirCacheWrite, and FcDirScanConfig > (aside from the call in fc-cache/fc-cache.c).Sure, but any normalize call from anywhere causes the whole file system to get scanned. Icky. Either we normalize all paths, or we normalize none and expect people to just pass us consistent path names.> There was some trickiness in calls to FcDirCacheUnlink due to multi-arch > cache files. Separating cache files on a per-arch basis will make that > problem go away; after we separate them, then we can preemptively blow > away cache files and nothing bad will happen.heh. already done.> The call in FcCacheReadDirs seems a bit strange. One might > want to normalize before calling FcConfigAcceptFilename to make sure > that the masks do the right thing (they didn''t in 2.3.2), because we can > require that masks must be on canonical filenames (as stated in the > <dir> entries). But that''s not what happens right now. And otherwise, > the arguments to FcCacheReadDirs should be, by definition, normalized, > since they come from the <dir> entries in the config file. So the call > there just seems to prevent infinite loops, as per bug 2667 (and 4754).I think we can have a recursion depth counter in the cache reading and writing code and bail if it gets too high; that''s a whole lot cheaper than scanning for duplicates.> Now FcDirScanConfig might create global caches with incorrect names if > the paths are not normalized. I haven''t analyzed that case yet, but it > seems like it could be potentially bad.I suggest that any paths that don''t start with ''/'' get cleaned up, and otherwise we just leave them alone. Here''s a suggested algorithm for doing the normalization: For relative paths: getcwd + pathname Clean up path (removing ./ and ../ elements) I don''t think we need to get fancier than this; a simple recursion depth check should catch any looping directory links. -- keith.packard@intel.com -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.freedesktop.org/archives/fontconfig/attachments/20060827/8a3d71df/attachment.pgp
James Cloos
2006-Aug-28 06:28 UTC
[Fontconfig] FcConfigNormalizeFontDir considered harmful
>>>>> "Keith" == Keith Packard <keithp@keithp.com> writes:Keith> That''s just for directory names that come from the user though, Keith> and there aren''t very many places where we have those aside Keith> from fc-cache. Don''t forget fc-cat. Another issue since the change from cache files in the same dirs as the fonts to a central cache dir is that fc-cache could not (I''ve not yet tested fc-2_4-keithp) cache dirs that were neither in fonts.conf <dir/> entries nor subdirectories thereof. Ie, you couldn''t preemptively cache directories. As a concrete example, I keep several poorer-quality collections under /usr/local/share/fonts, and add one or two at a time whenever I want to use fonts from them. I used to just run: ,---- | fc-cache -fv /usr/local/share/fonts `---- to ensure the cache files were up to date. But when I upgraded to a version that used /var/cache/fontconfig I had to add /usr/local/share/fonts to fonts.conf to run fc-cache on it, just to remove it again afterwords. (On a box like my laptop -- harware limited to a half gig of ram, and quite slow pata drives -- it is important not to speculatively have every possible font dir in fonts.conf; even with mmap(2)ed cache files it takes up too much ram.) It would also be useful for package management code to be able to generate the cache files when a font directory is installed, even if that directory is not yet referenced in fonts.conf. -JimC P.S. I presume master is matches CVS HEAD? And so fc-2_4_branch and fc-2_4-keithp are the interesting branches? -- James Cloos <cloos@jhcloos.com> OpenPGP: 0xED7DAEA6
Keith Packard
2006-Aug-28 08:13 UTC
[Fontconfig] FcConfigNormalizeFontDir considered harmful
On Mon, 2006-08-28 at 09:28 -0400, James Cloos wrote:> Don''t forget fc-cat.yes, of course.> Another issue since the change from cache files in the same dirs as > the fonts to a central cache dir is that fc-cache could not (I''ve not > yet tested fc-2_4-keithp) cache dirs that were neither in fonts.conf > <dir/> entries nor subdirectories thereof.I think that was a side-effect of directory normalization; the problem was that it couldn''t find the specified directory under one of the configured directories. This works now.> P.S. I presume master is matches CVS HEAD? And so fc-2_4_branch > and fc-2_4-keithp are the interesting branches?Yes. I didn''t restructure the repository at all. -- keith.packard@intel.com -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.freedesktop.org/archives/fontconfig/attachments/20060828/bc0d520b/attachment-0001.pgp