On Sat, 28 Apr 2012 11:19:13 -0500 Peter Bergner <bergner at vnet.ibm.com> wrote:> On Fri, 2012-04-27 at 20:30 -0500, Hal Finkel wrote: > > Thanks! Do you happen to know where this needs to be changed in > > clang or LLVM. The code that actually interprets the constraints, > > generically, is in CodeGen/SelectionDAG/TargetLowering.cpp, is clang > > relying on that code, or is there some frontend code in clang itself > > that is failing to initially interpret the string? If it is the > > code in TargetLowering, then I don't see any support there for '*' > > or '#'. > > Heh, I'm afraid I have no clue as to where clang needs to be changed. > I'm the team lead for IBM's Linux on POWER GCC development team, so > I can help you with questions about PPC hardware, PPC ABIs and why > GCC does things the way it does on PPC, but I'll not be of much > help with LLVM itself. I'm just a lurker here. :)That's great, thanks!> > That said, I'm curious about the extent of LLVM's support for PPC. > How robust is it? Does it support generating both 32-bit and 64-bit > binaries?LLVM supports generating both 32 bit and 64 binaries. I have used LLVM/clang to compile large and important codes on our Blue Gene supercomputers (and their POWER frontend nodes), including some that use the Boost C++ libraries; these codes run well and the performance is often quite reasonable. I've recently added processor itineraries for both the 440/450 and A2 embedded cores, and the code generation for these cores is now really quite good. There are some deficiencies, here are some that come to mind: - Support for the 128-bit double-double format used for long doubles on Linux (and AIX) is currently broken [I am actively working on fixing this]. - There is no support for generating position-independent code on PPC32. (PIC on PPC64 now works well). Nevertheless, I have sometimes run into linking errors when compiling shared libraries with C++ on PPC64. - There is no support for TLS. - Support for inline asm needs improvement (it often works, but sometimes I've run across unsupported constructs [as in this thread]). - The lowering code that generates the update forms of the load and store instructions is currently is buggy (and is disabled by default) [small test cases work, but enabling this on the test suite induces runtime failures]. This is currently my top priority for performance fixes (I am not sure how important it is on POWER, but on the embedded cores in makes a big difference) - There is currently no support for generating loops using control-registers for branch and increment (I am not sure if this matters on POWER, but it does make some difference for small trip-count loops on the embedded cores). - Register reservations can use some improvement. We currently need to reserve an additional register to handle the corner case where a condition register need to be spilled into a large stack frame (one register to compute the address, and a second one into which to transfer the condition register's contents). I'd like to improve this at some point. So if you stick to static linking and don't use TLS or long doubles, then it actually works quite well. Dynamic linking on PPC64 works most of the time. I've tried to keep the PPC 970 hazard detector in working order, but I've never really done much of a performance study on the non-embedded cores. Assistance with any of this would, of course, be greatly appreciated.> > I'll note that although I work on GCC, I have no problems seeing LLVM > supporting PPC. The more the merrier.Good! :) -Hal> > Peter > > >-- Hal Finkel Postdoctoral Appointee Leadership Computing Facility Argonne National Laboratory
On Sat, 28 Apr 2012 13:46:02 -0500 Hal Finkel <hfinkel at anl.gov> wrote:> On Sat, 28 Apr 2012 11:19:13 -0500 > Peter Bergner <bergner at vnet.ibm.com> wrote: > > > On Fri, 2012-04-27 at 20:30 -0500, Hal Finkel wrote: > > > Thanks! Do you happen to know where this needs to be changed in > > > clang or LLVM. The code that actually interprets the constraints, > > > generically, is in CodeGen/SelectionDAG/TargetLowering.cpp, is > > > clang relying on that code, or is there some frontend code in > > > clang itself that is failing to initially interpret the string? > > > If it is the code in TargetLowering, then I don't see any support > > > there for '*' or '#'. > > > > Heh, I'm afraid I have no clue as to where clang needs to be > > changed. I'm the team lead for IBM's Linux on POWER GCC development > > team, so I can help you with questions about PPC hardware, PPC ABIs > > and why GCC does things the way it does on PPC, but I'll not be of > > much help with LLVM itself. I'm just a lurker here. :) > > That's great, thanks! > > > > > That said, I'm curious about the extent of LLVM's support for PPC. > > How robust is it? Does it support generating both 32-bit and 64-bit > > binaries? > > LLVM supports generating both 32 bit and 64 binaries. I have used > LLVM/clang to compile large and important codes on our Blue Gene > supercomputers (and their POWER frontend nodes), including some that > use the Boost C++ libraries; these codes run well and the performance > is often quite reasonable. I've recently added processor itineraries > for both the 440/450 and A2 embedded cores, and the code generation > for these cores is now really quite good. There are some > deficiencies, here are some that come to mind: > > - Support for the 128-bit double-double format used for long doubles > on Linux (and AIX) is currently broken [I am actively working on > fixing this]. > - There is no support for generating position-independent code on > PPC32. (PIC on PPC64 now works well). Nevertheless, I have > sometimes run into linking errors when compiling shared libraries > with C++ on PPC64. > - There is no support for TLS. > - Support for inline asm needs improvement (it often works, but > sometimes I've run across unsupported constructs [as in this > thread]). > - The lowering code that generates the update forms of the load and > store instructions is currently is buggy (and is disabled by > default) [small test cases work, but enabling this on the test suite > induces runtime failures]. This is currently my top priority for > performance fixes (I am not sure how important it is on POWER, but on > the embedded cores in makes a big difference) > - There is currently no support for generating loops using > control-registers for branch and increment (I am not sure if this > matters on POWER, but it does make some difference for small > trip-count loops on the embedded cores). > - Register reservations can use some improvement. We currently need > to reserve an additional register to handle the corner case where a > condition register need to be spilled into a large stack frame (one > register to compute the address, and a second one into which to > transfer the condition register's contents). I'd like to improve > this at some point.I forgot to add: - Altivec support currently seems broken (there are some tests with altivec intrinsics in the test suite, these all fail to compile) - There is no VSX support. -Hal> > So if you stick to static linking and don't use TLS or long doubles, > then it actually works quite well. Dynamic linking on PPC64 works most > of the time. I've tried to keep the PPC 970 hazard detector in working > order, but I've never really done much of a performance study on the > non-embedded cores. Assistance with any of this would, of course, be > greatly appreciated. > > > > > I'll note that although I work on GCC, I have no problems seeing > > LLVM supporting PPC. The more the merrier. > > Good! :) > > -Hal > > > > > Peter > > > > > > > > >-- Hal Finkel Postdoctoral Appointee Leadership Computing Facility Argonne National Laboratory
On Sat, 28 Apr 2012 13:55:13 -0500 Hal Finkel <hfinkel at anl.gov> wrote:> On Sat, 28 Apr 2012 13:46:02 -0500 > Hal Finkel <hfinkel at anl.gov> wrote: > > > On Sat, 28 Apr 2012 11:19:13 -0500 > > Peter Bergner <bergner at vnet.ibm.com> wrote: > > > > > On Fri, 2012-04-27 at 20:30 -0500, Hal Finkel wrote: > > > > Thanks! Do you happen to know where this needs to be changed in > > > > clang or LLVM. The code that actually interprets the > > > > constraints, generically, is in > > > > CodeGen/SelectionDAG/TargetLowering.cpp, is clang relying on > > > > that code, or is there some frontend code in clang itself that > > > > is failing to initially interpret the string? If it is the code > > > > in TargetLowering, then I don't see any support there for '*' > > > > or '#'. > > > > > > Heh, I'm afraid I have no clue as to where clang needs to be > > > changed. I'm the team lead for IBM's Linux on POWER GCC > > > development team, so I can help you with questions about PPC > > > hardware, PPC ABIs and why GCC does things the way it does on > > > PPC, but I'll not be of much help with LLVM itself. I'm just a > > > lurker here. :) > > > > That's great, thanks! > > > > > > > > That said, I'm curious about the extent of LLVM's support for PPC. > > > How robust is it? Does it support generating both 32-bit and > > > 64-bit binaries? > > > > LLVM supports generating both 32 bit and 64 binaries. I have used > > LLVM/clang to compile large and important codes on our Blue Gene > > supercomputers (and their POWER frontend nodes), including some that > > use the Boost C++ libraries; these codes run well and the > > performance is often quite reasonable. I've recently added > > processor itineraries for both the 440/450 and A2 embedded cores, > > and the code generation for these cores is now really quite good. > > There are some deficiencies, here are some that come to mind: > > > > - Support for the 128-bit double-double format used for long > > doubles on Linux (and AIX) is currently broken [I am actively > > working on fixing this]. > > - There is no support for generating position-independent code on > > PPC32. (PIC on PPC64 now works well). Nevertheless, I have > > sometimes run into linking errors when compiling shared libraries > > with C++ on PPC64. > > - There is no support for TLS. > > - Support for inline asm needs improvement (it often works, but > > sometimes I've run across unsupported constructs [as in this > > thread]). > > - The lowering code that generates the update forms of the load and > > store instructions is currently is buggy (and is disabled by > > default) [small test cases work, but enabling this on the test suite > > induces runtime failures]. This is currently my top priority for > > performance fixes (I am not sure how important it is on POWER, but > > on the embedded cores in makes a big difference) > > - There is currently no support for generating loops using > > control-registers for branch and increment (I am not sure if this > > matters on POWER, but it does make some difference for small > > trip-count loops on the embedded cores). > > - Register reservations can use some improvement. We currently need > > to reserve an additional register to handle the corner case where a > > condition register need to be spilled into a large stack frame > > (one register to compute the address, and a second one into which to > > transfer the condition register's contents). I'd like to improve > > this at some point. > > I forgot to add: > - Altivec support currently seems broken (there are some tests with > altivec intrinsics in the test suite, these all fail to compile) > - There is no VSX support.Roman pointed out to me that I misspoke. LLVM only generates PIC on Darwin, not for ELF. What does work on PPC64 is dynamic linking (meaning that it will correctly put nop after the calls so that the linker can do its thing). To support dynamic linking on PPC32 we'd need to explicitly add other things (stubs?) and that is not implemented. -Hal> > -Hal > > > > > So if you stick to static linking and don't use TLS or long doubles, > > then it actually works quite well. Dynamic linking on PPC64 works > > most of the time. I've tried to keep the PPC 970 hazard detector in > > working order, but I've never really done much of a performance > > study on the non-embedded cores. Assistance with any of this would, > > of course, be greatly appreciated. > > > > > > > > I'll note that although I work on GCC, I have no problems seeing > > > LLVM supporting PPC. The more the merrier. > > > > Good! :) > > > > -Hal > > > > > > > > Peter > > > > > > > > > > > > > > > > > >-- Hal Finkel Postdoctoral Appointee Leadership Computing Facility Argonne National Laboratory