Gene Cumm wrote:> Three questions:
>
> - Where did the file come from?
> - Does tcase() stand for toggle case (or otherwise effectively the same
thing)?
> - Should uppercase characters like, the latin capital A, have tcase()
> data in addition to the lcase() data?
>
The file comes from the Unicode Consortium, ftp.unicode.org. The full
file is *huge* (over a megabyte), so I have the mksubset.pl to cut it
down to only those bits needed.
tcase stands for "Title Case": UPPER CASE, lower case, Title Case. It
matters for a handful of characters like:
U+01C4 LATIN CAPITAL LETTER DZ WITH CARON
U+01C5 LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH CARON
U+01C6 LATIN SMALL LETTER DZ WITH CARON
U+01C5 is title case. I decided title case is so rare (and I'm not even
sure if we have *any* instances of it in any of the common codepages)
that adding it would be a waste of space.
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.