Martin Storsjö via llvm-dev
2021-Oct-14 12:22 UTC
[llvm-dev] RFC: Support for preferring paths with forward slashes on Windows
Hi, When using Clang on Windows as a drop-in replacement for GCC, one issue that crops up fairly soon is that not all callers can tolerate paths spelled out with backslashes. This is an issue when e.g. libtool parses the output of "$CC -v" (where clang passes an absolute path to compiler-rt libraries) and uses parts of that in shell script contexts that don't tolerate backslashes, when some callers call "$CC --print-search-dirs", etc. This is also one of the most important things that MSYS2 patches in their distribution of Clang/LLVM according to their patch tracker [1]. (I've locally worked around this in my distribution without patching, by filtering clang's stdout in a wrapper, when options like "-v" or "--print-search-dirs" are detected, but that's essentially the same as patching.) I've finally taken the plunge and tried to implement this properly. I've got a decent patch set [2] that I could start sending for review, but before doing that, I'd like to discuss the overall design. The main idea is that I add a third alternative to path::Style - in addition to the existing Windows and Posix path styles, I'm adding Windows_forward, which otherwise parses and handles Windows paths like before (i.e. accepting and interpreting both separators), but with a different preferred separator (as returned by get_separator()). This allows any code on any platform to handle paths in all three forms, just like in the existing design, when explicitly giving a path::Style argument. To actually make it have effect, one can make path::Style::native act like Windows_forward instead of plain Windows. I'm not entirely sure what the best strategy is for when to do that - one could do it when LLVM itself was built for a MinGW target (which kind of breaks the assumption that the tools work pretty much the same as long as one passes the right --target options etc), or one could maybe set it up as a configure time CMake option? Or even make it a globally settable option in the process, to allow changing it e.g. depending on the tool's target configuration? I also faintly remember that Reid at some point implied that it could be an option to switch all Windows builds outright to such a behaviour? Most of the code is entirely independent of the policy decision of when/where to enable the behaviour - the decision is centralised to one single spot in LLVMSupport. In any case, with this design and a quite moderate amount of fixups, most of the tests in check-all seem to pass, if switching the preference. There's a couple tests that fail due to checking e.g. the literal paths %s or %t (as output by llvm-lit, with backslashes) against paths that the tools output. There's also a dozen or so of tests in Clang (mainly regarding PCH) that seem to misbehave when the same paths are referred to with varying kinds of slashes, e.g. stored with a forward slash in the PCH but referred to with backslashes in arguments to Clang, where paths are essentially equal but the strings differ. (For actual use with PCH, Clang built this way seems to work - and MSYS2 have been running with tools patched this way for quite some time, and I haven't heard about reports about bugs relating to that patch.) If the design seems sane (have a look at [2] if you want to have a look at my whole series at the moment) I'd start sending the initial patches for review. // Martin [1] https://github.com/msys2/MINGW-packages/blob/master/mingw-w64-clang/README-patches.md [2] https://github.com/llvm/llvm-project/compare/main...mstorsjo:path-separator
Chris Tetreault via llvm-dev
2021-Oct-14 16:43 UTC
[llvm-dev] RFC: Support for preferring paths with forward slashes on Windows
I could be mistaken, but I believe that since the dawn of time, Windows has just secretly supported forward slashes. A quick google search does not turn up any Microsoft docs stating that this is true, but I've heard rumors that it's been this way since DOS. On my Windows 10 machine, Powershell accepts /, cmd.exe accepts /, and Visual Studio accepts /. Whomever takes it upon themselves to work on this should test extensively before committing code. I would probably feel better if somebody could dig up some authoritative source on this. Assuming that this is the case, it would probably be nice if any paths we take in were just immediately canonicalized to use / and all paths just have forward slash. I know we have a ton of tests that have this `{(/|\\)}` regex in them, and it would be nice if we could just not do that. Thanks, Chris Tetreault -----Original Message----- From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of Martin Storsjö via llvm-dev Sent: Thursday, October 14, 2021 5:22 AM To: llvm-dev at lists.llvm.org Cc: mati865 at gmail.com; github at jdrake.com Subject: [llvm-dev] RFC: Support for preferring paths with forward slashes on Windows WARNING: This email originated from outside of Qualcomm. Please be wary of any links or attachments, and do not enable macros. Hi, When using Clang on Windows as a drop-in replacement for GCC, one issue that crops up fairly soon is that not all callers can tolerate paths spelled out with backslashes. This is an issue when e.g. libtool parses the output of "$CC -v" (where clang passes an absolute path to compiler-rt libraries) and uses parts of that in shell script contexts that don't tolerate backslashes, when some callers call "$CC --print-search-dirs", etc. This is also one of the most important things that MSYS2 patches in their distribution of Clang/LLVM according to their patch tracker [1]. (I've locally worked around this in my distribution without patching, by filtering clang's stdout in a wrapper, when options like "-v" or "--print-search-dirs" are detected, but that's essentially the same as patching.) I've finally taken the plunge and tried to implement this properly. I've got a decent patch set [2] that I could start sending for review, but before doing that, I'd like to discuss the overall design. The main idea is that I add a third alternative to path::Style - in addition to the existing Windows and Posix path styles, I'm adding Windows_forward, which otherwise parses and handles Windows paths like before (i.e. accepting and interpreting both separators), but with a different preferred separator (as returned by get_separator()). This allows any code on any platform to handle paths in all three forms, just like in the existing design, when explicitly giving a path::Style argument. To actually make it have effect, one can make path::Style::native act like Windows_forward instead of plain Windows. I'm not entirely sure what the best strategy is for when to do that - one could do it when LLVM itself was built for a MinGW target (which kind of breaks the assumption that the tools work pretty much the same as long as one passes the right --target options etc), or one could maybe set it up as a configure time CMake option? Or even make it a globally settable option in the process, to allow changing it e.g. depending on the tool's target configuration? I also faintly remember that Reid at some point implied that it could be an option to switch all Windows builds outright to such a behaviour? Most of the code is entirely independent of the policy decision of when/where to enable the behaviour - the decision is centralised to one single spot in LLVMSupport. In any case, with this design and a quite moderate amount of fixups, most of the tests in check-all seem to pass, if switching the preference. There's a couple tests that fail due to checking e.g. the literal paths %s or %t (as output by llvm-lit, with backslashes) against paths that the tools output. There's also a dozen or so of tests in Clang (mainly regarding PCH) that seem to misbehave when the same paths are referred to with varying kinds of slashes, e.g. stored with a forward slash in the PCH but referred to with backslashes in arguments to Clang, where paths are essentially equal but the strings differ. (For actual use with PCH, Clang built this way seems to work - and MSYS2 have been running with tools patched this way for quite some time, and I haven't heard about reports about bugs relating to that patch.) If the design seems sane (have a look at [2] if you want to have a look at my whole series at the moment) I'd start sending the initial patches for review. // Martin [1] https://github.com/msys2/MINGW-packages/blob/master/mingw-w64-clang/README-patches.md [2] https://github.com/llvm/llvm-project/compare/main...mstorsjo:path-separator
Fangrui Song via llvm-dev
2021-Oct-15 04:56 UTC
[llvm-dev] RFC: Support for preferring paths with forward slashes on Windows
On 2021-10-14, Martin Storsjö via llvm-dev wrote:>Hi, > >When using Clang on Windows as a drop-in replacement for GCC, one >issue that crops up fairly soon is that not all callers can tolerate >paths spelled out with backslashes. > >This is an issue when e.g. libtool parses the output of "$CC -v" >(where clang passes an absolute path to compiler-rt libraries) and >uses parts of that in shell script contexts that don't tolerate >backslashes, when some callers call "$CC --print-search-dirs", etc. > >This is also one of the most important things that MSYS2 patches in >their distribution of Clang/LLVM according to their patch tracker [1]. > >(I've locally worked around this in my distribution without patching, >by filtering clang's stdout in a wrapper, when options like "-v" or >"--print-search-dirs" are detected, but that's essentially the same as >patching.) > >I've finally taken the plunge and tried to implement this properly. >I've got a decent patch set [2] that I could start sending for review, >but before doing that, I'd like to discuss the overall design. > > >The main idea is that I add a third alternative to path::Style - in >addition to the existing Windows and Posix path styles, I'm adding >Windows_forward, which otherwise parses and handles Windows paths like >before (i.e. accepting and interpreting both separators), but with a >different preferred separator (as returned by get_separator()). > >This allows any code on any platform to handle paths in all three >forms, just like in the existing design, when explicitly giving a >path::Style argument. > >To actually make it have effect, one can make path::Style::native act >like Windows_forward instead of plain Windows. I'm not entirely sure >what the best strategy is for when to do that - one could do it when >LLVM itself was built for a MinGW target (which kind of breaks the >assumption that the tools work pretty much the same as long as one >passes the right --target options etc), or one could maybe set it up >as a configure time CMake option? Or even make it a globally settable >option in the process, to allow changing it e.g. depending on the >tool's target configuration? > >I also faintly remember that Reid at some point implied that it could >be an option to switch all Windows builds outright to such a >behaviour? > >Most of the code is entirely independent of the policy decision of >when/where to enable the behaviour - the decision is centralised to >one single spot in LLVMSupport. > >In any case, with this design and a quite moderate amount of fixups, >most of the tests in check-all seem to pass, if switching the >preference. > >There's a couple tests that fail due to checking e.g. the literal >paths %s or %t (as output by llvm-lit, with backslashes) against paths >that the tools output. There's also a dozen or so of tests in Clang >(mainly regarding PCH) that seem to misbehave when the same paths are >referred to with varying kinds of slashes, e.g. stored with a forward >slash in the PCH but referred to with backslashes in arguments to >Clang, where paths are essentially equal but the strings differ. (For >actual use with PCH, Clang built this way seems to work - and MSYS2 >have been running with tools patched this way for quite some time, and >I haven't heard about reports about bugs relating to that patch.) > >If the design seems sane (have a look at [2] if you want to have a >look at my whole series at the moment) I'd start sending the initial >patches for review. > >// Martin > >[1] https://github.com/msys2/MINGW-packages/blob/master/mingw-w64-clang/README-patches.md > >[2] https://github.com/llvm/llvm-project/compare/main...mstorsjo:path-separatorBig thanks to you for making investigation in this area! clang/test/Driver tests suffer the most from Windows backslashes. MC and DebugInfo suffer a bit as well. I have seen so many times a new test did not pass on Windows and a fixup follow-up was needed. Sometimes the author may adjust the test and slighly degrade the test quality if they cannot figure out the best way supporting both / and \ (using {{/|\\\\}} multiple times on one line could clutter up).
Michael Kruse via llvm-dev
2021-Oct-15 14:16 UTC
[llvm-dev] RFC: Support for preferring paths with forward slashes on Windows
Thanks for working on this. I also noticed is that the paths printed by clang -v or -### are escaping the backslashes and put them into quotes, i.e. "C:\\path\\to\\clang.exe" -cc1 "..\\special'^`character .c" Interestingly, it still works copy&pasting it to the Windows command line [2], but cmd.exe's escape character is ^ and PowerShell's is the backtick `. What would the correct output be? [1] https://stackoverflow.com/questions/33027024/documented-behavior-for-multiple-backslashes-in-windows-paths Michael Am Do., 14. Okt. 2021 um 07:22 Uhr schrieb Martin Storsjö via llvm-dev <llvm-dev at lists.llvm.org>:> > Hi, > > When using Clang on Windows as a drop-in replacement for GCC, one issue > that crops up fairly soon is that not all callers can tolerate paths > spelled out with backslashes. > > This is an issue when e.g. libtool parses the output of "$CC -v" (where > clang passes an absolute path to compiler-rt libraries) and uses parts of > that in shell script contexts that don't tolerate backslashes, when some > callers call "$CC --print-search-dirs", etc. > > This is also one of the most important things that MSYS2 patches in their > distribution of Clang/LLVM according to their patch tracker [1]. > > (I've locally worked around this in my distribution without patching, by > filtering clang's stdout in a wrapper, when options like "-v" or > "--print-search-dirs" are detected, but that's essentially the same as > patching.) > > I've finally taken the plunge and tried to implement this properly. I've > got a decent patch set [2] that I could start sending for review, but > before doing that, I'd like to discuss the overall design. > > > The main idea is that I add a third alternative to path::Style - in > addition to the existing Windows and Posix path styles, I'm adding > Windows_forward, which otherwise parses and handles Windows paths like > before (i.e. accepting and interpreting both separators), but with a > different preferred separator (as returned by get_separator()). > > This allows any code on any platform to handle paths in all three forms, > just like in the existing design, when explicitly giving a path::Style > argument. > > To actually make it have effect, one can make path::Style::native act like > Windows_forward instead of plain Windows. I'm not entirely sure what the > best strategy is for when to do that - one could do it when LLVM itself > was built for a MinGW target (which kind of breaks the assumption that the > tools work pretty much the same as long as one passes the right --target > options etc), or one could maybe set it up as a configure time CMake > option? Or even make it a globally settable option in the process, to > allow changing it e.g. depending on the tool's target configuration? > > I also faintly remember that Reid at some point implied that it could be > an option to switch all Windows builds outright to such a behaviour? > > Most of the code is entirely independent of the policy decision of > when/where to enable the behaviour - the decision is centralised to one > single spot in LLVMSupport. > > In any case, with this design and a quite moderate amount of fixups, most > of the tests in check-all seem to pass, if switching the preference. > > There's a couple tests that fail due to checking e.g. the literal paths %s > or %t (as output by llvm-lit, with backslashes) against paths that the > tools output. There's also a dozen or so of tests in Clang (mainly > regarding PCH) that seem to misbehave when the same paths are referred to > with varying kinds of slashes, e.g. stored with a forward slash in the PCH > but referred to with backslashes in arguments to Clang, where paths are > essentially equal but the strings differ. (For actual use with PCH, Clang > built this way seems to work - and MSYS2 have been running with tools > patched this way for quite some time, and I haven't heard about reports > about bugs relating to that patch.) > > If the design seems sane (have a look at [2] if you want to have a look at > my whole series at the moment) I'd start sending the initial patches for > review. > > // Martin > > [1] https://github.com/msys2/MINGW-packages/blob/master/mingw-w64-clang/README-patches.md > > [2] https://github.com/llvm/llvm-project/compare/main...mstorsjo:path-separator > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev