On Tue, Sep 20, 2011 at 11:52 AM, Nikola Smiljanic <popizdeh at gmail.com> wrote:> I spent some more time on this. My idea was to use functionality from > llvm::sys::fs like file_status instead of stat struct, but as it turns out > this is not really possible. file_status structure is not a replacement for > stat, nor are there functions inside llvm::sys::fs that can replace calls to > ::stat and ::open. > The only solution I can see is to create wrappers around ::stat, ::open, > etc. and use them in the codebase instead of direct calls. Unix > implementation would only forward calls to the right function. Windows > implementation would convert path from utf8 to utf16 and call the > appropriate function (::_wopen instead of ::open and so on). Is this > solution acceptable? > Here's a patch where I tried this (it has a number of issues but I'd like to > hear from someone that this solution makes sense before I try to address > them).General nitpick: make sure you're using spaces instead of tabs, and that your lines are less than 80 characters wide. + wchar_t** wargv = ::CommandLineToArgvW(::GetCommandLineW(), &argc); You may want to test the consistency of this approach from the original with regards to backslashes and quotation marks, if for no other reason than documenting it. CommandLineToArgvW calls them out in the documentation. + for (int i = 0; i != argc; ++i) + { + int len = ::WideCharToMultiByte(CP_UTF8, 0, wargv[i], -1, NULL, 0, NULL, NULL); + argv[i] = new char[len]; You should check the return value of WideCharToMultiByte. I don't expect it to fail, but better safe than sorry. I would guess an assert would be acceptable. I couldn't see it from the patch, but are all the calls to Open on Windows only, or is there an Open implementation for the Mac that calls open instead of _wopen? I agree with the general gist of the patch though. Thanks! ~Aaron
Nikola Smiljanic
2011-Sep-20 21:15 UTC
[LLVMdev] [cfe-dev] Unicode path handling on Windows
OK since this approach makes sense I'll shoot with my questions :) 1. Where should get_utf8_argv go and is the name of this function OK? Right now the function is inside llvm::sys::fs namespace because I need access to Windows.h, should I leave it there. 2. get_utf8_argv allocates new char** representing utf8 argv, should I deallocate this inside driver.cpp, use std::vector<std::string> instead or ignore the allocation completely? 3. There is an #ifdef inside driver.cpp right now that I don't like. get_utf8_argv works for windows only, but I could change the function to accept argv and return it as vector of utf8 strings. It would convert char* to std::string on unix and use GetCommandLine + utf8 conversion on windows. 4. Should I wrap functions like ::fstat and ::close for consistency even though they don't work with paths? I'll fix everything that is formatting related. You're right, current patch has Open only for windows but I'll add the other one as well. On Tue, Sep 20, 2011 at 7:42 PM, Aaron Ballman <aaron at aaronballman.com>wrote:> > You may want to test the consistency of this approach from the > original with regards to backslashes and quotation marks, if for no > other reason than documenting it. CommandLineToArgvW calls them out > in the documentation. > >I don't understand what you meant by this? I'm aware that CommandLineToArgvW has special interpretation of backslashes and quotes, but what should I check exactly? -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110920/ca3fdb39/attachment.html>
On Tue, Sep 20, 2011 at 4:15 PM, Nikola Smiljanic <popizdeh at gmail.com> wrote:> OK since this approach makes sense I'll shoot with my questions :) > 1. Where should get_utf8_argv go and is the name of this function OK? Right > now the function is inside llvm::sys::fs namespace because I need access to > Windows.h, should I leave it there.I don't think it belongs there as it has nothing to do with the file system. However, I'm not familiar enough to know of a better place for it either. Hopefully someone else can chime in with an idea.> 2. get_utf8_argv allocates new char** representing utf8 argv, should I > deallocate this inside driver.cpp, use std::vector<std::string> instead or > ignore the allocation completely?Perhaps some SmallVectors would make sense here? An inner one to hold the arg data, and an outer one to hold the list of inner vectors. Then you can do the cleanup within the call.> 3. There is an #ifdef inside driver.cpp right now that I don't like. > get_utf8_argv works for windows only, but I could change the function to > accept argv and return it as vector of utf8 strings. It would convert char* > to std::string on unix and use GetCommandLine + utf8 conversion on windows.I have no concrete opinion one way or the other on this.> 4. Should I wrap functions like ::fstat and ::close for consistency even > though they don't work with paths?I don't believe so.> I'll fix everything that is formatting related. You're right, current patch > has Open only for windows but I'll add the other one as well.Thanks for the help! ~Aaron
Apparently Analagous Threads
- [LLVMdev] [cfe-dev] Unicode path handling on Windows
- [LLVMdev] [cfe-dev] Unicode path handling on Windows
- [LLVMdev] [cfe-dev] Unicode path handling on Windows
- [LLVMdev] [cfe-dev] Unicode path handling on Windows
- [LLVMdev] [cfe-dev] Unicode path handling on Windows