As was mentioned once before, the correct solution is to never use multibyte anywhere. Any Windows functions that currently return multibyte strings should be converted to their wide-string (unicode) equivalent, with the result converted to UTF-8.> From: Nikola Smiljanic <popizdeh at gmail.com> > > I think I got it this time. I realized that ::open and ::stat work just fine with multibyte paths on windows so there's no need to change this code. The only problem is llvm::sys::fs module which falsely assumes that input strings are UTF8 encoded when they are in fact multibyte strings. > > Now I really hope I haven't broken anything because llvm::sys::fs::exists is called in a number of places, but I'm guessing that none of the paths that are passed to it are really UTF8? > > I think entire llvm::sys::fs module should be changed to use MultibyteToUTF16 instead of UTF8ToUTF16 before calling windows api functions (unless somebody knows that we actually have UTF8 paths on windows somewhere in the code)?
Nikola Smiljanic
2011-Sep-07 06:28 UTC
[LLVMdev] [cfe-dev] Unicode path handling on Windows
The problem is not in the functions that return multibyte strings (the multibyte string is coming from argv) but in the functions that can't handle utf8 input on windows, such as ::open and ::stat. llvm::sys::fs module assumes utf8 input and I don't think this is true for windows. One solution would be to make the module work with multibyte strings as I've done, and the other one would be to convert everything to utf8 in which case a lot of code would have to change because we'd have to convert from utf8 to utf16 whenever we call windows api functions. And note that ::wstat has different argument type than ::stat, and this structure is passed all around. On Wed, Sep 7, 2011 at 2:22 AM, Bryce Cogswell <bryceco at yahoo.com> wrote:> As was mentioned once before, the correct solution is to never use > multibyte anywhere. Any Windows functions that currently return multibyte > strings should be converted to their wide-string (unicode) equivalent, with > the result converted to UTF-8. > > > > From: Nikola Smiljanic <popizdeh at gmail.com> > > > > I think I got it this time. I realized that ::open and ::stat work just > fine with multibyte paths on windows so there's no need to change this code. > The only problem is llvm::sys::fs module which falsely assumes that input > strings are UTF8 encoded when they are in fact multibyte strings. > > > > Now I really hope I haven't broken anything because llvm::sys::fs::exists > is called in a number of places, but I'm guessing that none of the paths > that are passed to it are really UTF8? > > > > I think entire llvm::sys::fs module should be changed to use > MultibyteToUTF16 instead of UTF8ToUTF16 before calling windows api functions > (unless somebody knows that we actually have UTF8 paths on windows somewhere > in the code)? > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110907/39bfd5f6/attachment.html>
On Tue, Sep 6, 2011 at 11:28 PM, Nikola Smiljanic <popizdeh at gmail.com> wrote:> The problem is not in the functions that return multibyte strings (the > multibyte string is coming from argv)argv is implicitly the return from a UTF16->multibyte conversion (i.e. it's lossy). -Eli
Reasonably Related Threads
- [LLVMdev] [cfe-dev] Unicode path handling on Windows
- [LLVMdev] [cfe-dev] Unicode path handling on Windows
- [LLVMdev] [cfe-dev] Unicode path handling on Windows
- [LLVMdev] [cfe-dev] Unicode path handling on Windows
- [LLVMdev] [cfe-dev] Unicode path handling on Windows