On Wed, Sep 7, 2011 at 12:20 AM, Nikola Smiljanic <popizdeh at gmail.com> wrote:> So what are you exactly saying? Somebody proposed using GetCommandLine > instead of using argv directly. And what about my other points about ::open > and ::stat?Because of the way Windows works, the only fully correct solution is to never touch the native charset, and keep everything in UTF-8/16 all the time. Anything else is going to lead to bugs like "clang fails to find my file in a directory whose name has a Greek letter in it". If we're going to be spending time messing with this code, we should take the time to get it right. If it requires some extra refactoring to make that work, that's acceptable... -Eli> On Wed, Sep 7, 2011 at 8:52 AM, Eli Friedman <eli.friedman at gmail.com> wrote: >> >> On Tue, Sep 6, 2011 at 11:28 PM, Nikola Smiljanic <popizdeh at gmail.com> >> wrote: >> > The problem is not in the functions that return multibyte strings (the >> > multibyte string is coming from argv) >> >> argv is implicitly the return from a UTF16->multibyte conversion (i.e. >> it's lossy). >> >> -Eli > >
Nikola Smiljanic
2011-Sep-20 16:52 UTC
[LLVMdev] [cfe-dev] Unicode path handling on Windows
I spent some more time on this. My idea was to use functionality from llvm::sys::fs like file_status instead of stat struct, but as it turns out this is not really possible. file_status structure is not a replacement for stat, nor are there functions inside llvm::sys::fs that can replace calls to ::stat and ::open. The only solution I can see is to create wrappers around ::stat, ::open, etc. and use them in the codebase instead of direct calls. Unix implementation would only forward calls to the right function. Windows implementation would convert path from utf8 to utf16 and call the appropriate function (::_wopen instead of ::open and so on). Is this solution acceptable? Here's a patch where I tried this (it has a number of issues but I'd like to hear from someone that this solution makes sense before I try to address them). On Wed, Sep 7, 2011 at 9:39 AM, Eli Friedman <eli.friedman at gmail.com> wrote:> On Wed, Sep 7, 2011 at 12:20 AM, Nikola Smiljanic <popizdeh at gmail.com> > wrote: > > So what are you exactly saying? Somebody proposed using GetCommandLine > > instead of using argv directly. And what about my other points about > ::open > > and ::stat? > > Because of the way Windows works, the only fully correct solution is > to never touch the native charset, and keep everything in UTF-8/16 all > the time. Anything else is going to lead to bugs like "clang fails to > find my file in a directory whose name has a Greek letter in it". If > we're going to be spending time messing with this code, we should take > the time to get it right. If it requires some extra refactoring to > make that work, that's acceptable... > > -Eli > > > On Wed, Sep 7, 2011 at 8:52 AM, Eli Friedman <eli.friedman at gmail.com> > wrote: > >> > >> On Tue, Sep 6, 2011 at 11:28 PM, Nikola Smiljanic <popizdeh at gmail.com> > >> wrote: > >> > The problem is not in the functions that return multibyte strings (the > >> > multibyte string is coming from argv) > >> > >> argv is implicitly the return from a UTF16->multibyte conversion (i.e. > >> it's lossy). > >> > >> -Eli > > > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110920/711ea316/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: clang.patch Type: application/octet-stream Size: 2100 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110920/711ea316/attachment.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: llvm.patch Type: application/octet-stream Size: 2670 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110920/711ea316/attachment-0001.obj>
On Tue, Sep 20, 2011 at 11:52 AM, Nikola Smiljanic <popizdeh at gmail.com> wrote:> I spent some more time on this. My idea was to use functionality from > llvm::sys::fs like file_status instead of stat struct, but as it turns out > this is not really possible. file_status structure is not a replacement for > stat, nor are there functions inside llvm::sys::fs that can replace calls to > ::stat and ::open. > The only solution I can see is to create wrappers around ::stat, ::open, > etc. and use them in the codebase instead of direct calls. Unix > implementation would only forward calls to the right function. Windows > implementation would convert path from utf8 to utf16 and call the > appropriate function (::_wopen instead of ::open and so on). Is this > solution acceptable? > Here's a patch where I tried this (it has a number of issues but I'd like to > hear from someone that this solution makes sense before I try to address > them).General nitpick: make sure you're using spaces instead of tabs, and that your lines are less than 80 characters wide. + wchar_t** wargv = ::CommandLineToArgvW(::GetCommandLineW(), &argc); You may want to test the consistency of this approach from the original with regards to backslashes and quotation marks, if for no other reason than documenting it. CommandLineToArgvW calls them out in the documentation. + for (int i = 0; i != argc; ++i) + { + int len = ::WideCharToMultiByte(CP_UTF8, 0, wargv[i], -1, NULL, 0, NULL, NULL); + argv[i] = new char[len]; You should check the return value of WideCharToMultiByte. I don't expect it to fail, but better safe than sorry. I would guess an assert would be acceptable. I couldn't see it from the patch, but are all the calls to Open on Windows only, or is there an Open implementation for the Mac that calls open instead of _wopen? I agree with the general gist of the patch though. Thanks! ~Aaron
Apparently Analagous Threads
- [LLVMdev] [cfe-dev] Unicode path handling on Windows
- [LLVMdev] [cfe-dev] Unicode path handling on Windows
- [LLVMdev] [cfe-dev] Unicode path handling on Windows
- [LLVMdev] [cfe-dev] Unicode path handling on Windows
- [LLVMdev] [cfe-dev] Unicode path handling on Windows