Nikola Smiljanic
2011-Sep-01 20:17 UTC
[LLVMdev] [cfe-dev] Unicode path handling on Windows
AFAIK Clang internals do assume utf8, and llvm::sys::path converts strings to utf16 on windows and calls W API functions. If somebody would like to take a look at my changes and comment on them. Here's a brief explanation of what I did: - Convert argv to utf8 using current system locale for win32 (this is done as soon as possible inside ExpandArgv). This makes the driver happy since calls to llvm::sys::path::exists succeed. - Change calls to ::open (inside FileSystemStatCache and MemoryBuffer) to ::_wopen on win32 by converting the path to utf16. - In order to do the conversions I had to expose two functions, one of them was already there but wasn't visible, the other one was added by me Known issues: - I should probably use LLVM_ON_WIN32 instead of WIN32 but this macro isn't defined inside FileSystemStatCache and MemoryBuffer for some reason. Both of these files have an #ifdef section that deals with O_BINARY so maybe these two sections should be consolidated? - Functions convert_multibyte_to_utf8 and convert_utf8_to_utf16 have definitions only on windows so every other platform is currently broken. On Thu, Sep 1, 2011 at 5:44 PM, Ruben Van Boxem <vanboxem.ruben at gmail.com>wrote:> Isn't it more straightforward to use utf-8 internally and use the > conversion functions provided by the win32 API when calling other win32 API > functions, and always call the wide versions of the win32 functions. Full > compatibility guaranteed, and one encoding internally. > > Ruben >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110901/b724988b/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: unicode_path_clang.patch Type: application/octet-stream Size: 1811 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110901/b724988b/attachment.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: unicode_path_llvm.patch Type: application/octet-stream Size: 2973 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110901/b724988b/attachment-0001.obj>
NAKAMURA Takumi
2011-Sep-02 04:00 UTC
[LLVMdev] [cfe-dev] Unicode path handling on Windows
Nikola, Your patchset does not work; e>bin\clang.exe -S なかむら\たくみ.c error: error reading '邵コ・ェ邵コ荵昴・郢ァ蝎らクコ貅假ソ・邵コ・ソ.c' 1 error generated. - Would it be not enough in somewhere? I suspect clang still might be pathv1-dependent. (I guess, pathv1 would assume ansi) - raw_ostream does not handle utf8, but ansi, on win32. I would like to propose; - converting utf8 and utf16 may move to llvm/lib/Support. - we may get rid of CP_UTF8 with Win32 API. It must be trivial. ps. excuse me, I might respond you more, later. (oops lunch time was over...) ...Takumi
Nikola Smiljanic
2011-Sep-02 05:20 UTC
[LLVMdev] [cfe-dev] Unicode path handling on Windows
2011/9/2 NAKAMURA Takumi <geek4civic at gmail.com>> Nikola, > > Your patchset does not work; > > e>bin\clang.exe -S なかむら\たくみ.c >How can your filename have a backslash? -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110902/67f68396/attachment.html>
Nikola Smiljanic
2011-Sep-02 06:41 UTC
[LLVMdev] [cfe-dev] Unicode path handling on Windows
The patch should work for unicode filename, I just realized that it doesn't work for unicode directories. FileSystemStatCache calls ::stat for directories, and this doesn't work for utf8 input the same way ::open doesn't work. I tried to replace it with ::_wstat but this function has a different signature. I think we should take a different approach: 1. convert all command line input to utf8 2. rework FileSystemStatCache and MemoryBuffer to use llvm::sys::fs and never explicitly call ::open or ::stat llvm::sys::fs already has a status function but I'm not sure if it can be used as ::stat replacement? Can this module be used to open files, I couldn't find this anywhere? 2011/9/2 NAKAMURA Takumi <geek4civic at gmail.com>> Nikola, > > Your patchset does not work; > > e>bin\clang.exe -S なかむら\たくみ.c > error: error reading '邵コ・ェ邵コ荵昴・郢ァ蝎らクコ貅假ソ・邵コ・ソ.c' > 1 error generated. > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110902/b8bf13bd/attachment.html>