Nikola Smiljanic
2011-Sep-02 06:41 UTC
[LLVMdev] [cfe-dev] Unicode path handling on Windows
The patch should work for unicode filename, I just realized that it doesn't work for unicode directories. FileSystemStatCache calls ::stat for directories, and this doesn't work for utf8 input the same way ::open doesn't work. I tried to replace it with ::_wstat but this function has a different signature. I think we should take a different approach: 1. convert all command line input to utf8 2. rework FileSystemStatCache and MemoryBuffer to use llvm::sys::fs and never explicitly call ::open or ::stat llvm::sys::fs already has a status function but I'm not sure if it can be used as ::stat replacement? Can this module be used to open files, I couldn't find this anywhere? 2011/9/2 NAKAMURA Takumi <geek4civic at gmail.com>> Nikola, > > Your patchset does not work; > > e>bin\clang.exe -S なかむら\たくみ.c > error: error reading '邵コ・ェ邵コ荵昴・郢ァ蝎らクコ貅假ソ・邵コ・ソ.c' > 1 error generated. > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110902/b8bf13bd/attachment.html>
Nikola Smiljanic
2011-Sep-06 05:52 UTC
[LLVMdev] [cfe-dev] Unicode path handling on Windows
I think I got it this time. I realized that ::open and ::stat work just fine with multibyte paths on windows so there's no need to change this code. The only problem is llvm::sys::fs module which falsely assumes that input strings are UTF8 encoded when they are in fact multibyte strings. Now I really hope I haven't broken anything because llvm::sys::fs::exists is called in a number of places, but I'm guessing that none of the paths that are passed to it are really UTF8? I think entire llvm::sys::fs module should be changed to use MultibyteToUTF16 instead of UTF8ToUTF16 before calling windows api functions (unless somebody knows that we actually have UTF8 paths on windows somewhere in the code)? -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110906/065214f9/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: PathV2.inc.patch Type: application/octet-stream Size: 1469 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110906/065214f9/attachment.obj>
As was mentioned once before, the correct solution is to never use multibyte anywhere. Any Windows functions that currently return multibyte strings should be converted to their wide-string (unicode) equivalent, with the result converted to UTF-8.> From: Nikola Smiljanic <popizdeh at gmail.com> > > I think I got it this time. I realized that ::open and ::stat work just fine with multibyte paths on windows so there's no need to change this code. The only problem is llvm::sys::fs module which falsely assumes that input strings are UTF8 encoded when they are in fact multibyte strings. > > Now I really hope I haven't broken anything because llvm::sys::fs::exists is called in a number of places, but I'm guessing that none of the paths that are passed to it are really UTF8? > > I think entire llvm::sys::fs module should be changed to use MultibyteToUTF16 instead of UTF8ToUTF16 before calling windows api functions (unless somebody knows that we actually have UTF8 paths on windows somewhere in the code)?
Reasonably Related Threads
- [LLVMdev] [cfe-dev] Unicode path handling on Windows
- [LLVMdev] [cfe-dev] Unicode path handling on Windows
- [LLVMdev] [cfe-dev] Unicode path handling on Windows
- [LLVMdev] [cfe-dev] Unicode path handling on Windows
- [LLVMdev] [cfe-dev] Unicode path handling on Windows