On Sat, 28 Apr 2012 13:55:13 -0500 Hal Finkel <hfinkel at anl.gov> wrote:> On Sat, 28 Apr 2012 13:46:02 -0500 > Hal Finkel <hfinkel at anl.gov> wrote: > > > On Sat, 28 Apr 2012 11:19:13 -0500 > > Peter Bergner <bergner at vnet.ibm.com> wrote: > > > > > On Fri, 2012-04-27 at 20:30 -0500, Hal Finkel wrote: > > > > Thanks! Do you happen to know where this needs to be changed in > > > > clang or LLVM. The code that actually interprets the > > > > constraints, generically, is in > > > > CodeGen/SelectionDAG/TargetLowering.cpp, is clang relying on > > > > that code, or is there some frontend code in clang itself that > > > > is failing to initially interpret the string? If it is the code > > > > in TargetLowering, then I don't see any support there for '*' > > > > or '#'. > > > > > > Heh, I'm afraid I have no clue as to where clang needs to be > > > changed. I'm the team lead for IBM's Linux on POWER GCC > > > development team, so I can help you with questions about PPC > > > hardware, PPC ABIs and why GCC does things the way it does on > > > PPC, but I'll not be of much help with LLVM itself. I'm just a > > > lurker here. :) > > > > That's great, thanks! > > > > > > > > That said, I'm curious about the extent of LLVM's support for PPC. > > > How robust is it? Does it support generating both 32-bit and > > > 64-bit binaries? > > > > LLVM supports generating both 32 bit and 64 binaries. I have used > > LLVM/clang to compile large and important codes on our Blue Gene > > supercomputers (and their POWER frontend nodes), including some that > > use the Boost C++ libraries; these codes run well and the > > performance is often quite reasonable. I've recently added > > processor itineraries for both the 440/450 and A2 embedded cores, > > and the code generation for these cores is now really quite good. > > There are some deficiencies, here are some that come to mind: > > > > - Support for the 128-bit double-double format used for long > > doubles on Linux (and AIX) is currently broken [I am actively > > working on fixing this]. > > - There is no support for generating position-independent code on > > PPC32. (PIC on PPC64 now works well). Nevertheless, I have > > sometimes run into linking errors when compiling shared libraries > > with C++ on PPC64. > > - There is no support for TLS. > > - Support for inline asm needs improvement (it often works, but > > sometimes I've run across unsupported constructs [as in this > > thread]). > > - The lowering code that generates the update forms of the load and > > store instructions is currently is buggy (and is disabled by > > default) [small test cases work, but enabling this on the test suite > > induces runtime failures]. This is currently my top priority for > > performance fixes (I am not sure how important it is on POWER, but > > on the embedded cores in makes a big difference) > > - There is currently no support for generating loops using > > control-registers for branch and increment (I am not sure if this > > matters on POWER, but it does make some difference for small > > trip-count loops on the embedded cores). > > - Register reservations can use some improvement. We currently need > > to reserve an additional register to handle the corner case where a > > condition register need to be spilled into a large stack frame > > (one register to compute the address, and a second one into which to > > transfer the condition register's contents). I'd like to improve > > this at some point. > > I forgot to add: > - Altivec support currently seems broken (there are some tests with > altivec intrinsics in the test suite, these all fail to compile) > - There is no VSX support.Roman pointed out to me that I misspoke. LLVM only generates PIC on Darwin, not for ELF. What does work on PPC64 is dynamic linking (meaning that it will correctly put nop after the calls so that the linker can do its thing). To support dynamic linking on PPC32 we'd need to explicitly add other things (stubs?) and that is not implemented. -Hal> > -Hal > > > > > So if you stick to static linking and don't use TLS or long doubles, > > then it actually works quite well. Dynamic linking on PPC64 works > > most of the time. I've tried to keep the PPC 970 hazard detector in > > working order, but I've never really done much of a performance > > study on the non-embedded cores. Assistance with any of this would, > > of course, be greatly appreciated. > > > > > > > > I'll note that although I work on GCC, I have no problems seeing > > > LLVM supporting PPC. The more the merrier. > > > > Good! :) > > > > -Hal > > > > > > > > Peter > > > > > > > > > > > > > > > > > >-- Hal Finkel Postdoctoral Appointee Leadership Computing Facility Argonne National Laboratory
On Sat, 2012-04-28 at 15:51 -0500, Hal Finkel wrote:> > > - There is no support for generating position-independent code on > > > PPC32. (PIC on PPC64 now works well). Nevertheless, I have > > > sometimes run into linking errors when compiling shared libraries > > > with C++ on PPC64.PPC64 is PIC by nature. As for the linking issue, possibly you blew the TOC with too many entries? It used to be even with GCC, we could not compile doxygen (with or without -mminimal-toc) without filling up the TOC and hitting the TOC overflow linker error. T fix those types of problems, we recently added two more code models to GCC/binutils, so we're no longer limited to 16-bit TOC offsets. We now how -mcmodel=medium (32-bit TOC offsets) and -mcmodel=large (64-bit TOC offsets), with -mcmodel=medium being the new GCC default (on PPC64). The old TOC code is now called -mcmodel=small.> > > - There is currently no support for generating loops using > > > control-registers for branch and increment (I am not sure if this > > > matters on POWER, but it does make some difference for small > > > trip-count loops on the embedded cores).It helps on our server class hardware too, so we do make use of it.> > > - Register reservations can use some improvement. We currently need > > > to reserve an additional register to handle the corner case where a > > > condition register need to be spilled into a large stack frame > > > (one register to compute the address, and a second one into which to > > > transfer the condition register's contents). I'd like to improve > > > this at some point.Reserve as in you don't allow anything to be allocated to it just in the uncommon case you have to spill a condition reg to a stack slot you cannot write to with a 16-bit offset? Speaking as a person who has implemented register allocators, that is bad!> Roman pointed out to me that I misspoke. LLVM only generates PIC on > Darwin, not for ELF. What does work on PPC64 is dynamic linking > (meaning that it will correctly put nop after the calls so that the > linker can do its thing). To support dynamic linking on PPC32 we'd need > to explicitly add other things (stubs?) and that is not implemented.If by stubs you mean PLT call stubs, those are created by the linker for both PPC and PPC64 binaries. I'm not sure what distro you're running on, but you may be hitting the new 32-bit secure-plt implementation all new distros are using. The old 32-bit PLT code used to generatie a branch/return to the GOT and the updated LR value was used to gain addressability to the GOT. The problem is that the GOT is in the date section, so for that to work, the data section of your program had to be marked executable. With -msecure-plt (the new default for all new distros), that is no longer the case. Maybe the non secure-plt code isn't playing well with the system crt*.o files and libs? Are there build directions for building LLVM for ppc/ppc64? I thought I had read that clang didn't work for ppc/ppc64 and that you had to use llvm-gcc thingy. Is that not the case anymore? Peter
On Tue, 01 May 2012 15:10:56 -0500 Peter Bergner <bergner at vnet.ibm.com> wrote:> On Sat, 2012-04-28 at 15:51 -0500, Hal Finkel wrote: > > > > - There is no support for generating position-independent code > > > > on PPC32. (PIC on PPC64 now works well). Nevertheless, I have > > > > sometimes run into linking errors when compiling shared > > > > libraries with C++ on PPC64. > > PPC64 is PIC by nature. As for the linking issue, possibly you blew > the TOC with too many entries? It used to be even with GCC, we could > not compile doxygen (with or without -mminimal-toc) without filling > up the TOC and hitting the TOC overflow linker error. T fix those > types of problems, we recently added two more code models to > GCC/binutils, so we're no longer limited to 16-bit TOC offsets. We > now how -mcmodel=medium (32-bit TOC offsets) and -mcmodel=large > (64-bit TOC offsets), with -mcmodel=medium being the new GCC default > (on PPC64). The old TOC code is now called -mcmodel=small.This is good to know, we should definitely make sure this is supported in the clang driver. I believe that I've generally been able to compile shared libraries on PPC64, but, when compiling Boost for example, I've seen linking errors due to multiply defined constructor and destructor symbols (I've not yet had a chance to look into this).> > > > > > > > - There is currently no support for generating loops using > > > > control-registers for branch and increment (I am not sure if > > > > this matters on POWER, but it does make some difference for > > > > small trip-count loops on the embedded cores). > > It helps on our server class hardware too, so we do make use of it. > > > > > > - Register reservations can use some improvement. We currently > > > > need to reserve an additional register to handle the corner > > > > case where a condition register need to be spilled into a large > > > > stack frame (one register to compute the address, and a second > > > > one into which to transfer the condition register's contents). > > > > I'd like to improve this at some point. > > Reserve as in you don't allow anything to be allocated to it just in > the uncommon case you have to spill a condition reg to a stack slot > you cannot write to with a 16-bit offset? Speaking as a person who > has implemented register allocators, that is bad!Yes, this is exactly what now happens, and it needs to be fixed (this is also my fault, I introduced this behavior to fix a bug [the register scavenger used by the spilling code only has one emergency spill slot, and in the case you mentioned, we need two registers]).> > > > > Roman pointed out to me that I misspoke. LLVM only generates PIC on > > Darwin, not for ELF. What does work on PPC64 is dynamic linking > > (meaning that it will correctly put nop after the calls so that the > > linker can do its thing). To support dynamic linking on PPC32 we'd > > need to explicitly add other things (stubs?) and that is not > > implemented. > > If by stubs you mean PLT call stubs, those are created by the linker > for both PPC and PPC64 binaries.Yes, exactly. I knew that the linker created these on PPC64, but I thought some compiler involvement was necessary for PPC32. If that is not true, then our job just got easier ;) Unfortunately, I know very little about this; the extent of my experience is this: when I started working with the PPC backend, on PPC64, the NOPs were not always placed after the calls correctly (which predictably caused linking errors when using dynamic linking); I fixed this and now I can dynamically link executables on PPC64. If you could look at the asm produced and help us to figure out what, if anything, is wrong with it, that would be greatly appreciated.> > I'm not sure what distro you're running on, but you may be hitting > the new 32-bit secure-plt implementation all new distros are using. > The old 32-bit PLT code used to generatie a branch/return to the GOT > and the updated LR value was used to gain addressability to the GOT. > The problem is that the GOT is in the date section, so for that to > work, the data section of your program had to be marked executable. > With -msecure-plt (the new default for all new distros), that is > no longer the case. Maybe the non secure-plt code isn't playing > well with the system crt*.o files and libs? > > Are there build directions for building LLVM for ppc/ppc64? > I thought I had read that clang didn't work for ppc/ppc64 and that > you had to use llvm-gcc thingy. Is that not the case anymore?LLVM/clang now will build in the normal way (./configure; make install) on PPC (you'll need at least the 3.1 release candidate (or trunk)). I generally build on my PPC64 hosts with: make ENABLE_OPTIMIZED=1 OPTIMIZE_OPTION=-O2 EXTRA_OPTIONS=-mminimal-toc Thanks again, Hal> > Peter > > > >-- Hal Finkel Postdoctoral Appointee Leadership Computing Facility Argonne National Laboratory