On Tue, 2012-05-01 at 19:58 -0500, Peter Bergner wrote:> On Tue, 2012-05-01 at 17:47 -0500, Hal Finkel wrote: > > By default it should build for > > whatever the current host is (no special flags required). To > > specifically build for something else, use: > > -ccc-host-triple powerpc64-unknown-linux-gnu > > or > > -ccc-host-triple powerpc-unknown-linux-gnu > > So LLVM isn't biarch capable? Meaning one LLVM compiler cannot > generate both 32-bit and 64-bit binaries?Sorry for replying to my own message, but... Oh, -ccc-host-triple is a compiler option and not a configure option. That does work, though it seems I have to link with gcc, since llvm still wants to link against the 64-bit crt*.o and libs. Maybe it is easier to just have two separate builds. That said, my simple dynamically linked hello world executed fine (ie, it was able to call into libc.so just fine), as well as an old C version of the SPEC97 tomcatv benchmark I have laying around. So it seems both 32-bit and 64-bit can call into shared libs. Not to say I haven't seen some code gen warts (using -O3). :)>From hello.s:main: mflr 0 stw 31, -4(1) stw 0, 4(1) stwu 1, -16(1) lis 3, .Lstr at ha mr 31, 1 la 3, .Lstr at l(3) bl puts li 3, 0 addi 1, 1, 16 lwz 0, 4(1) lwz 31, -4(1) mtlr 0 blr By the strict letter of the 32-bit ABI, the save and restore of r31 at a negative offset of r1 is verboten. The ABI states the the stack space below the stack pointer is declared as volatile. I actually debugged a similar problem way back in my Blue Gene/L days, where gcc had a bug and was doing the same thing. We ended up taking a signal between the restore of the stack pointer and the restore of the nonvolatile reg and the BGL compute node kernel trashed the stack below the stack pointer. The second wart is the dead copy to r31...which leads to the unnecessary save and restore of r31. For tomcatv, we have to basically save/restore the entire set of non-volatile integer and fp registers. Looking at how llvm does that shows: ... lis 3, 56 ori 3, 3, 57680 stwx 16, 31, 3 lis 3, 56 ori 3, 3, 57684 stwx 17, 31, 3 lis 3, 56 ori 3, 3, 57688 stwx 18, 31, 3 lis 3, 56 ori 3, 3, 57692 stwx 19, 31, 3 lis 3, 56 ori 3, 3, 57696 stwx 20, 31, 3 lis 3, 56 ori 3, 3, 57700 stwx 21, 31, 3 [repeated over and over and ...] Kind of ugly! :) GCC on the other hand stashes away the old value of the stack pointer and then uses small negative offsets (legal at this point since we've already decremented the stack pointer) from that for all of its saves/restores: ... lis 0,0xffc7 mr 12,1 ori 0,0,7728 stwux 1,1,0 mflr 0 stw 0,4(12) stfd 14,-144(12) stfd 15,-136(12) stfd 16,-128(12) stfd 17,-120(12) stfd 18,-112(12) ... For things that don't work, do you have a small example program that shows what's wrong? Peter
On Tue, 01 May 2012 21:25:29 -0500 Peter Bergner <bergner at vnet.ibm.com> wrote:> On Tue, 2012-05-01 at 19:58 -0500, Peter Bergner wrote: > > On Tue, 2012-05-01 at 17:47 -0500, Hal Finkel wrote: > > > By default it should build for > > > whatever the current host is (no special flags required). To > > > specifically build for something else, use: > > > -ccc-host-triple powerpc64-unknown-linux-gnu > > > or > > > -ccc-host-triple powerpc-unknown-linux-gnu > > > > So LLVM isn't biarch capable? Meaning one LLVM compiler cannot > > generate both 32-bit and 64-bit binaries? > > Sorry for replying to my own message, but... > > Oh, -ccc-host-triple is a compiler option and not a configure option. > That does work, though it seems I have to link with gcc, since llvm > still wants to link against the 64-bit crt*.o and libs. Maybe it is > easier to just have two separate builds.FWIW, you can also use the -gcc-toolchain and -ccc-gcc-name parameters to switch what gcc install is used for linking [although it should find the correct libs by itself, assuming things are in vaguely-default install paths, but perhaps that is not working for you?].> > That said, my simple dynamically linked hello world executed fine > (ie, it was able to call into libc.so just fine), as well as an > old C version of the SPEC97 tomcatv benchmark I have laying around. > So it seems both 32-bit and 64-bit can call into shared libs. > > Not to say I haven't seen some code gen warts (using -O3). :) > > From hello.s: > > main: > mflr 0 > stw 31, -4(1) > stw 0, 4(1) > stwu 1, -16(1) > lis 3, .Lstr at ha > mr 31, 1 > la 3, .Lstr at l(3) > bl puts > li 3, 0 > addi 1, 1, 16 > lwz 0, 4(1) > lwz 31, -4(1) > mtlr 0 > blr > > By the strict letter of the 32-bit ABI, the save and restore of > r31 at a negative offset of r1 is verboten. The ABI states the > the stack space below the stack pointer is declared as volatile. > I actually debugged a similar problem way back in my Blue Gene/L > days, where gcc had a bug and was doing the same thing. We ended > up taking a signal between the restore of the stack pointer and > the restore of the nonvolatile reg and the BGL compute node kernel > trashed the stack below the stack pointer.Interesting, we should definitely fix this. I've been trying to get things in working order here so that we can use clang/llvm on our BG/P and Q [as soon as I finish writing regression tests, I have support for Double Hummer and QPX ready, and I'll contribute that as well].> > The second wart is the dead copy to r31...which leads to the > unnecessary save and restore of r31.And we should clean this up too ;)> > For tomcatv, we have to basically save/restore the entire set > of non-volatile integer and fp registers. Looking at how > llvm does that shows: > > ... > lis 3, 56 > ori 3, 3, 57680 > stwx 16, 31, 3 > lis 3, 56 > ori 3, 3, 57684 > stwx 17, 31, 3 > lis 3, 56 > ori 3, 3, 57688 > stwx 18, 31, 3 > lis 3, 56 > ori 3, 3, 57692 > stwx 19, 31, 3 > lis 3, 56 > ori 3, 3, 57696 > stwx 20, 31, 3 > lis 3, 56 > ori 3, 3, 57700 > stwx 21, 31, 3 > [repeated over and over and ...] > > Kind of ugly! :) GCC on the other hand stashes away the old value of > the stack pointer and then uses small negative offsets (legal at this > point since we've already decremented the stack pointer) from that for > all of its saves/restores: > > ... > lis 0,0xffc7 > mr 12,1 > ori 0,0,7728 > stwux 1,1,0 > mflr 0 > stw 0,4(12) > stfd 14,-144(12) > stfd 15,-136(12) > stfd 16,-128(12) > stfd 17,-120(12) > stfd 18,-112(12) > ... > For things that don't work, do you have a small example program > that shows what's wrong?Roman, can you comment? Thanks again, Hal> > Peter > > > >-- Hal Finkel Postdoctoral Appointee Leadership Computing Facility Argonne National Laboratory
On Tue, 01 May 2012 21:25:29 -0500 Peter Bergner <bergner at vnet.ibm.com> wrote:> On Tue, 2012-05-01 at 19:58 -0500, Peter Bergner wrote: > > On Tue, 2012-05-01 at 17:47 -0500, Hal Finkel wrote: > > > By default it should build for > > > whatever the current host is (no special flags required). To > > > specifically build for something else, use: > > > -ccc-host-triple powerpc64-unknown-linux-gnu > > > or > > > -ccc-host-triple powerpc-unknown-linux-gnu > > > > So LLVM isn't biarch capable? Meaning one LLVM compiler cannot > > generate both 32-bit and 64-bit binaries? > > Sorry for replying to my own message, but... > > Oh, -ccc-host-triple is a compiler option and not a configure option. > That does work, though it seems I have to link with gcc, since llvm > still wants to link against the 64-bit crt*.o and libs. Maybe it is > easier to just have two separate builds. > > That said, my simple dynamically linked hello world executed fine > (ie, it was able to call into libc.so just fine), as well as an > old C version of the SPEC97 tomcatv benchmark I have laying around. > So it seems both 32-bit and 64-bit can call into shared libs. > > Not to say I haven't seen some code gen warts (using -O3). :) > > From hello.s: > > main: > mflr 0 > stw 31, -4(1) > stw 0, 4(1) > stwu 1, -16(1) > lis 3, .Lstr at ha > mr 31, 1 > la 3, .Lstr at l(3) > bl puts > li 3, 0 > addi 1, 1, 16 > lwz 0, 4(1) > lwz 31, -4(1) > mtlr 0 > blr > > By the strict letter of the 32-bit ABI, the save and restore of > r31 at a negative offset of r1 is verboten. The ABI states the > the stack space below the stack pointer is declared as volatile. > I actually debugged a similar problem way back in my Blue Gene/L > days, where gcc had a bug and was doing the same thing. We ended > up taking a signal between the restore of the stack pointer and > the restore of the nonvolatile reg and the BGL compute node kernel > trashed the stack below the stack pointer.Just to confirm, this is an issue specific to the 32-bit ABI, correct? gcc (4.4.6) seems to do the same thing for PPC64. Thanks again, Hal> > The second wart is the dead copy to r31...which leads to the > unnecessary save and restore of r31. > > For tomcatv, we have to basically save/restore the entire set > of non-volatile integer and fp registers. Looking at how > llvm does that shows: > > ... > lis 3, 56 > ori 3, 3, 57680 > stwx 16, 31, 3 > lis 3, 56 > ori 3, 3, 57684 > stwx 17, 31, 3 > lis 3, 56 > ori 3, 3, 57688 > stwx 18, 31, 3 > lis 3, 56 > ori 3, 3, 57692 > stwx 19, 31, 3 > lis 3, 56 > ori 3, 3, 57696 > stwx 20, 31, 3 > lis 3, 56 > ori 3, 3, 57700 > stwx 21, 31, 3 > [repeated over and over and ...] > > Kind of ugly! :) GCC on the other hand stashes away the old value of > the stack pointer and then uses small negative offsets (legal at this > point since we've already decremented the stack pointer) from that for > all of its saves/restores: > > ... > lis 0,0xffc7 > mr 12,1 > ori 0,0,7728 > stwux 1,1,0 > mflr 0 > stw 0,4(12) > stfd 14,-144(12) > stfd 15,-136(12) > stfd 16,-128(12) > stfd 17,-120(12) > stfd 18,-112(12) > ... > For things that don't work, do you have a small example program > that shows what's wrong? > > Peter > > > >-- Hal Finkel Postdoctoral Appointee Leadership Computing Facility Argonne National Laboratory
On Sat, 2012-05-12 at 00:47 -0500, Hal Finkel wrote:> On Tue, 01 May 2012 21:25:29 -0500 > Peter Bergner <bergner at vnet.ibm.com> wrote: > > By the strict letter of the 32-bit ABI, the save and restore of > > r31 at a negative offset of r1 is verboten. The ABI states the > > the stack space below the stack pointer is declared as volatile. > > I actually debugged a similar problem way back in my Blue Gene/L > > days, where gcc had a bug and was doing the same thing. We ended > > up taking a signal between the restore of the stack pointer and > > the restore of the nonvolatile reg and the BGL compute node kernel > > trashed the stack below the stack pointer. > > Just to confirm, this is an issue specific to the 32-bit ABI, correct? > gcc (4.4.6) seems to do the same thing for PPC64.Correct, this is a specific 32-bit ABI issue. The 64-bit ABI allows some access below the stack pointer. From the 64-bit ABI: The 288 bytes below the stack pointer is available as volatile storage which is not preserved across function calls. Interrupt handlers and any other functions that might run without an explicit call must take care to preserve this region. If a function does not need more stack space than is available in this area, it does not need to have a stack frame. Peter
On Tue, 01 May 2012 21:25:29 -0500 Peter Bergner <bergner at vnet.ibm.com> wrote:> On Tue, 2012-05-01 at 19:58 -0500, Peter Bergner wrote: > > On Tue, 2012-05-01 at 17:47 -0500, Hal Finkel wrote: > > > By default it should build for > > > whatever the current host is (no special flags required). To > > > specifically build for something else, use: > > > -ccc-host-triple powerpc64-unknown-linux-gnu > > > or > > > -ccc-host-triple powerpc-unknown-linux-gnu > > > > So LLVM isn't biarch capable? Meaning one LLVM compiler cannot > > generate both 32-bit and 64-bit binaries? > > Sorry for replying to my own message, but... > > Oh, -ccc-host-triple is a compiler option and not a configure option. > That does work, though it seems I have to link with gcc, since llvm > still wants to link against the 64-bit crt*.o and libs. Maybe it is > easier to just have two separate builds. > > That said, my simple dynamically linked hello world executed fine > (ie, it was able to call into libc.so just fine), as well as an > old C version of the SPEC97 tomcatv benchmark I have laying around. > So it seems both 32-bit and 64-bit can call into shared libs. > > Not to say I haven't seen some code gen warts (using -O3). :) > > From hello.s: > > main: > mflr 0 > stw 31, -4(1) > stw 0, 4(1) > stwu 1, -16(1) > lis 3, .Lstr at ha > mr 31, 1 > la 3, .Lstr at l(3) > bl puts > li 3, 0 > addi 1, 1, 16 > lwz 0, 4(1) > lwz 31, -4(1) > mtlr 0 > blr > > By the strict letter of the 32-bit ABI, the save and restore of > r31 at a negative offset of r1 is verboten. The ABI states the > the stack space below the stack pointer is declared as volatile. > I actually debugged a similar problem way back in my Blue Gene/L > days, where gcc had a bug and was doing the same thing. We ended > up taking a signal between the restore of the stack pointer and > the restore of the nonvolatile reg and the BGL compute node kernel > trashed the stack below the stack pointer. > > The second wart is the dead copy to r31...which leads to the > unnecessary save and restore of r31. > > For tomcatv, we have to basically save/restore the entire set > of non-volatile integer and fp registers. Looking at how > llvm does that shows: > > ... > lis 3, 56 > ori 3, 3, 57680 > stwx 16, 31, 3 > lis 3, 56 > ori 3, 3, 57684 > stwx 17, 31, 3 > lis 3, 56 > ori 3, 3, 57688 > stwx 18, 31, 3 > lis 3, 56 > ori 3, 3, 57692 > stwx 19, 31, 3 > lis 3, 56 > ori 3, 3, 57696 > stwx 20, 31, 3 > lis 3, 56 > ori 3, 3, 57700 > stwx 21, 31, 3 > [repeated over and over and ...] > > Kind of ugly! :) GCC on the other hand stashes away the old value of > the stack pointer and then uses small negative offsets (legal at this > point since we've already decremented the stack pointer) from that for > all of its saves/restores: > > ... > lis 0,0xffc7 > mr 12,1 > ori 0,0,7728 > stwux 1,1,0 > mflr 0 > stw 0,4(12) > stfd 14,-144(12) > stfd 15,-136(12) > stfd 16,-128(12) > stfd 17,-120(12) > stfd 18,-112(12) > ...Peter, There is a FIXME comment in the current code which reads:> FIXME This disables some code that aligns the stack to a boundary > bigger than the default (16 bytes on Darwin) when there is a stack > local of greater alignment. This does not currently work, because > the delta between old and new stack pointers is added to offsets that > reference incoming parameters after the prolog is generated, and the > code that does that doesn't handle a variable delta. You don't want > to do that anyway; a better approach is to reserve another register > that retains to the incoming stack pointer, and reference parameters > relative to that. > #define ALIGN_STACK 0So given that this should also be fixed, presumably also by making an extra copy of the stack pointer, should we always do this on PPC32? Is there any difference for PPC64? Thanks again, Hal> For things that don't work, do you have a small example program > that shows what's wrong? > > Peter > > > >-- Hal Finkel Postdoctoral Appointee Leadership Computing Facility Argonne National Laboratory