Joachim Durchholz
2011-Oct-03 20:59 UTC
[LLVMdev] [cfe-dev] Unicode path handling on Windows
Am 03.10.2011 22:12, schrieb Nikola Smiljanic:> How about this: > > for (int i = 0; i != NumWChars; ++i) > absPath[i] = std::tolower(absPath[i], std::locale()); > > seems to be working just fine?You have two assumptions here: Assumption 1: For each lowercase character, there is an equivalent uppercase character, and vice versa. This is not true in half a dozen languages according to ftp://ftp.unicode.org/Public/UNIDATA/SpecialCasing.txt . Assumption 2: The transformation from lower case to upper case can be done for each character individually, without considering context. This is not true in a couple of languages according to SpecialCasing.txt. Do not do that. If you get complaints, they will be about scripts that you can't type on your keyboard, and that you know nothing about so you don't even know what the right behaviour would have been. Rely on the relevant Unicode library. Which one that would be, and which functions to call, depends on what you need that to-lowercase transformation for. (It also depends on whether the names you get are already normalized or not; I'd want to run a normalization pass on the names first just to be on the safe side.) Regards, Jo
On 10/03/2011 11:59 PM, Joachim Durchholz wrote:> Am 03.10.2011 22:12, schrieb Nikola Smiljanic: >> How about this: >> >> for (int i = 0; i != NumWChars; ++i) >> absPath[i] = std::tolower(absPath[i], std::locale()); >> >> seems to be working just fine? > > You have two assumptions here: > > Assumption 1: For each lowercase character, there is an equivalent > uppercase character, and vice versa. > This is not true in half a dozen languages according to > ftp://ftp.unicode.org/Public/UNIDATA/SpecialCasing.txt . > > Assumption 2: The transformation from lower case to upper case can be > done for each character individually, without considering context. > This is not true in a couple of languages according to SpecialCasing.txt. > > Do not do that. If you get complaints, they will be about scripts that > you can't type on your keyboard, and that you know nothing about so you > don't even know what the right behaviour would have been. > Rely on the relevant Unicode library. Which one that would be, and which > functions to call, depends on what you need that to-lowercase > transformation for. (It also depends on whether the names you get are > already normalized or not; I'd want to run a normalization pass on the > names first just to be on the safe side.)Does Windows do proper Unicode to-lowercase, or does it just lowercase A-Z?>From reading the below article I get that you can create filenames that would be consideredidentical under Unicode to-lowercase rules, but yet they exist as different files: https://blogs.msdn.com/b/michkap/archive/2005/10/17/481600.aspx Best regards, --Edwin
Joachim Durchholz
2011-Oct-03 21:42 UTC
[LLVMdev] [cfe-dev] Unicode path handling on Windows
Am 03.10.2011 23:18, schrieb Török Edwin:> I get that you can create filenames that would be considered > identical under Unicode to-lowercase rules, but yet they exist as different files:Hehe, I can imagine that. That's why I was proposing to simply ask the filesystem. Though in hindsight, I may have been to hasty - the question is: what is that to-lowercase transformation needed for? The right course of action definitely depends on that. Unicode is too complicated for the simple answers (and there are good reasons for that). Regards, Jo
Reasonably Related Threads
- [LLVMdev] [cfe-dev] Unicode path handling on Windows
- [LLVMdev] [cfe-dev] Unicode path handling on Windows
- [LLVMdev] [cfe-dev] Unicode path handling on Windows
- [LLVMdev] [cfe-dev] Unicode path handling on Windows
- [LLVMdev] [cfe-dev] Unicode path handling on Windows