I missed that the testing case is returning a struct. You are right in VARegSaveSize. For callee: sub sp, sp, #16 push {r11, lr} mov r11, sp sub sp, sp, #8 str r3, [r11, #20] str r2, [r11, #16] str r1, [r11, #12] ldr r1, [r11, #76] The beginning of the input struct @ sp_at_entry - 16 - 8 + 12 = sp_at_entry -12 # of leftover bytes 67-12 = 55 r11+76 is @ sp_at_entry - 24 + 76 = sp_at_entry + 52, this is incorrect, it should be at align(55, 4) = 56. For caller: mov r0, sp ldr r1, .LCPI1_0 str r1, [r0, #56] the 2nd argument is at sp_at_entry + 56, which is correct. On my setup (built from TOT), I got "ldr r1, [r11, #80]" instead of 76. Thanks, Manman On Jun 18, 2013, at 11:31 PM, Rajesh Viswabramana wrote:> Hi all, > > Thanks for all comments, > Filed bug report with details, > http://llvm.org/bugs/show_bug.cgi?id=16368 > > @Manman, please find comments, queries inline. > > Regards, > Rajesh > > ------- Original Message ------- > Sender : Stepan Dyatkovskiy<stpworld at narod.ru> > Date : Jun 19, 2013 05:15 (GMT+09:00) > Title : Re: [LLVMdev] ARM struct byval size > 64 triggers failure > > Hi all, > One more interesting job :-) > I'll look too at this case tomorrow. Today my brain is about to be exploded.. > > -Stepan. > > 18.06.2013, 22:56, "Manman Ren" <mren at apple.com>: >> >> Hi Rajesh, >> >> The callee code looks okay to me >>> Assembly for check114 >>> --------------------------------------------------------------- >>> sub sp, sp, #16 >>> push {r11, lr} >>> mov r11, sp >>> sub sp, sp, #8 >>> str r3, [r11, #20] >>> str r2, [r11, #16] >>> str r1, [r11, #12] >>> ldr r1, [r11, #76] >> VARegSaveSize is 16 because we store the first 16 bytes of struct byval in r0 to r3. >> >> For the test code/assembly I mentioned, only r1-r3 are used for struct byval. r0 used for return val. >> So, NumGPRs = 3, VARegSize = 12, VARegSaveSize =16(4 byte offset), access of arg1 is going wrong. >> For updated test code: >> struct S114 check114 (int a, struct S114 arg0, struct S114* arg1) { >> ..... >> } >> Only r1, r2 used for struct byval, NumGPRs = 2, VARegSize = 8, VARegSaveSize =8, this case works, able to access arg1. >> >> Please correct me, if my above understanding is wrong about NumGPRs, VARegSaveSize calculation. >> >> Align in computeRegArea is 8 since ABI says the stack pointer needs to be 8 byte aligned at function entry point. >> But the second argument does not have to be 8 byte aligned, in fact it is 4 byte aligned for i32. >> >> Ok. >> >> r11, #76 is equivalent to sp_at_entry + 52 since r11 = spat_entry - 16 - 8, which is 4-byte aligned after >> storing the leftover (67-16=51) bytes of struct byval. >> >> For test code, 3 reg used for struct byval, left over will be (67- 12 = 55) copied to stack by caller, 55 sets of ldrb,strb in assembly. >> >> Pasting dump again for locating arg1 from sp_at_entry, >> @entry of check114 >> sp 0xbefff808 0xbefff808 --> sp_at_entry >> At arg1 accessing instruction [0xbefff808 + 52] -> [0xbefff83c] -> 0x4071706f >> 0xbefff7e4: 0x4001ed08 0x40024f90 0x4071706f 0xbefff8a0 >> 0xbefff7f4: 0x0000869c 0x00000000 0x3231302f 0x36353433 >> 0xbefff804: 0x3a393837 0x3e3d3c3b 0x4241403f 0x46454443 >> 0xbefff814: 0x4a494847 0x4e4d4c4b 0x5251504f 0x56555453 >> 0xbefff824: 0x5a595857 0x5e5d5c5b 0x6261605f 0x66656463 >> 0xbefff834: 0x6a696867 0x6e6d6c6b 0x4071706f 0x00010861 >> >> Can you also paste the assembly for the caller side and check whether the second argument is stored >> at sp_at_entry+52? >> >> Please find the attached assembly file. >> >> As Renato suggested, please file a bug report. >> >> Filed bug. >> >> Thanks, >> Manman >> >> >> On Jun 18, 2013, at 4:26 AM, Rajesh Viswabramana <rajesh.vis at samsung.com> wrote: >> >>> Hi, >>> >>> Handling of pass by val of struct size >64 bytes case is seems wrong for arm targets. >>> >>> Summary: >>> Incase of struct pass by value for size > 64 along with other function params, failure seen in some corner cases. Access to function params result in wrong stack location access. >>> Stack pointer adjustment done by prologue emitter and offset used to access function params have different logics for calculaton. >>> >>> Test code >>> --------------------------------------------------------------- >>> #include <stdio.h> >>> struct S114 { >>> char a[67]; >>> }a114[5]; >>> struct S114 check114 (struct S114 arg0, struct S114* arg1) { >>> if(&a114[0] != arg1) // arg1 value is wrong >>> printf( "values %p, %p\n", &a114[0], arg1); >>> } >>> int main () { >>> int i= 0, j = 0; >>> for (;j<2; j++) // just filling a114 with some values for identification >>> for(i=0; i<sizeof(struct S114); i++) >>> memset(&a114[j].a[i],(0x11+i+j*30), sizeof(int)); >>> check114 (a114[1], &a114[0]); //=> a114[0] is accessed from wrong location inside check114 function >>> } >>> --------------------------------------------------------------- >>> clang -v >>> clang version 3.3 (tags/RELEASE_33/final) >>> Target: i386-pc-linux-gnu >>> Thread model: posix >>> >>> Output on arm : >>> # ./check114.exe >>> values 0x10861, 0x4071706f >>> which is wrong. >>> >>> Assembly for check114 >>> --------------------------------------------------------------- >>> sub sp, sp, #16 >>> push {r11, lr} >>> mov r11, sp >>> sub sp, sp, #8 >>> str r3, [r11, #20] >>> str r2, [r11, #16] >>> str r1, [r11, #12] >>> ldr r1, [r11, #76] >>> str r1, [sp, #4] >>> .loc 1 7 0 prologue_end >>> ldr r2, .LCPI0_0 >>> cmp r2, r1 >>> beq .LBB0_2 >>> b .LBB0_1 >>> --------------------------------------------------------------- >>> >>> From reg, stack dump: >>> ------------------------------------------------------------------------------------------------------------------------------ >>> @entry of check114 >>> => 0x8398 <check114>: sub sp, sp, #16 >>> 0x839c <check114+4>: push {r11, lr} >>> sp 0xbefff808 0xbefff808 >>> @if condition >>> 0x83b4 <check114+28>: ldr r1, [r11, #76] ; 0x4c <--- wrong value copied to r1, offset #76 should be #80 >>> 0x83b8 <check114+32>: str r1, [sp, #4] >>> 0x83bc <check114+36>: ldr r2, [pc, #44] ; 0x83f0 <check114+88> >>> => 0x83c0 <check114+40>: cmp r2, r1 >>> >>> r11 0xbefff7f0 -1090521104 >>> sp 0xbefff7e8 0xbefff7e8 >>> Stack dump: >>> 0xbefff7e4: 0x4001ed08 0x40024f90 0x4071706f 0xbefff8a0 >>> 0xbefff7f4: 0x0000869c 0x00000000 0x3231302f 0x36353433 >>> 0xbefff804: 0x3a393837 0x3e3d3c3b 0x4241403f 0x46454443 >>> 0xbefff814: 0x4a494847 0x4e4d4c4b 0x5251504f 0x56555453 >>> 0xbefff824: 0x5a595857 0x5e5d5c5b 0x6261605f 0x66656463 >>> 0xbefff834: 0x6a696867 0x6e6d6c6b 0x4071706f 0x00010861 //[R11+4c] -> [0xbefff7f0+4c] -> [0xbefff83c] -> 0x4071706f >>> Correct value is at location {[R11+4c]+4} --> 0x00010861, 4 bytes offset going wrong. >>> ------------------------------------------------------------------------------------------------------------------------------ >>> When i checked from the ARM Lowering part for generation of >>> sub sp, sp, #16 >>> Emitted by, >>> if (VARegSaveSize) >>> emitSPUpdate(isARM, MBB, MBBI, dl, TII, -VARegSaveSize, // --> VARegSaveSize is calculated in computeRegArea >>> MachineInstr::FrameSetup) >>> >>> ARMTargetLowering::computeRegArea(..) { >>> ... >>> VARegSize = NumGPRs * 4; >>> VARegSaveSize = (VARegSize + Align - 1) & ~(Align - 1); // --> 8 byte alignment done here >>> } >>> Stack pointer decremented to NumGPRs*4 + alignment >>> NumGPRs = 3 registers >>> VARegSaveSize = 16 (after considering 8 byte alignment ) >>> >>> When the offset(#76) for the instruction, "ldr r1, [r11, #76] ; 0x4c" is calculated, 4 bytes alignment is considered. >>> In prologue stackpointer calculation 8 byte alignment is considered. >>> Due to this mimatch of alignment, If try to access any parameter after byval which results wrong value. >>> >>> Issue(or offset of 4 bytes) wont occur if even number of register used for byval spilling. >>> ex: >>> struct S114 check114 (int a, struct S114 arg0, struct S114* arg1) { // accessing arg1 is fine in this case >>> ..... >>> } >>> >>> Could someone comment on below queries about fixing the problem, >>> 1) Is this 8 byte alignment mandatory ? Is this due to " ARM AAPCS 5.2.1.2 Stack constraints at a public interface" ? Can this be removed? >>> 2) We will leave alignment as it is but in prologue we will adjust SP once again, this is little meaningless. >>> 3) While accessing arg1 we will consider alignment and add extra offset -> looks better. >>> Offset to access arg1 is calculated by selection DAG that will be target independent. But Alignment adjustment should be done by target lowering. Any suggestions on how to fix this ? >>> >>> Regards, >>> Rajesh >>> >>> <201306181656803_BEI0XT4N.gif> >>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> , >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > > > > <201306191201558_BEI0XT4N.gif> > <check114.s>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130619/5449b9a6/attachment.html>
On 19 June 2013 18:37, Manman Ren <mren at apple.com> wrote:> On my setup (built from TOT), I got "ldr r1, [r11, #80]" instead of 76. >Mine too, for both ARMv4 and ARMv7. Are you sure you got the latest checkout? cheers, --renato -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130619/c4a920bd/attachment.html>
Stepan Dyatkovskiy
2013-Jun-19 18:08 UTC
[LLVMdev] ARM struct byval size > 64 triggers failure
Hi Rajesh, I'm in some stage of looking what exactly happens. As Manman mentioned r0 is reserved for function return. Though, perhaps somewhere we didn't catch that... Would you tell me, please, your llvm revision number? -Stepan Manman Ren wrote:> > I missed that the testing case is returning a struct. > You are right in VARegSaveSize. > > For callee: > subsp, sp, #16 > push{r11, lr} > movr11, sp > subsp, sp, #8 > strr3, [r11, #20] > strr2, [r11, #16] > strr1, [r11, #12] > ldrr1, [r11, #76] > > The beginning of the input struct @ sp_at_entry - 16 - 8 + 12 > sp_at_entry -12 > # of leftover bytes 67-12 = 55 > r11+76 is @ sp_at_entry - 24 + 76 = sp_at_entry + 52, this is incorrect, > it should be at align(55, 4) = 56. > > For caller: > movr0, sp > ldrr1, .LCPI1_0 > strr1, [r0, #56] > > the 2nd argument is at sp_at_entry + 56, which is correct. > > On my setup (built from TOT), I got "ldr r1, [r11, #80]" instead of 76. > > Thanks, > Manman > > On Jun 18, 2013, at 11:31 PM, Rajesh Viswabramana wrote: > >> Hi all, >> >> >> Thanks for all comments, >> >> Filed bug report with details, >> >> http://llvm.org/bugs/show_bug.cgi?id=16368 >> >> @Manman, please find comments, queries inline. >> >> >> Regards, >> Rajesh >> >> >> -------*Original Message*------- >> >> *Sender*: Stepan Dyatkovskiy<stpworld at narod.ru <mailto:stpworld at narod.ru>> >> >> *Date*: Jun 19, 2013 05:15 (GMT+09:00) >> >> *Title*: Re: [LLVMdev] ARM struct byval size > 64 triggers failure >> >> >> Hi all, >> One more interesting job :-) >> I'll look too at this case tomorrow. Today my brain is about to be >> exploded.. >> -Stepan. >> 18.06.2013, 22:56, "Manman Ren" <mren at apple.com <mailto:mren at apple.com>>: >>> Hi Rajesh, >>> The callee code looks okay to me >>>> >>>> Assembly for check114 >>>> --------------------------------------------------------------- >>>> sub sp, sp, #16 >>>> push {r11, lr} >>>> mov r11, sp >>>> sub sp, sp, #8 >>>> str r3, [r11, #20] >>>> str r2, [r11, #16] >>>> str r1, [r11, #12] >>>> ldr r1, [r11, #76] >>>> >>> VARegSaveSize is 16 because we store the first 16 bytes of struct >>> byval in r0 to r3. >>> For the test code/assembly I mentioned, only r1-r3 are used for >>> struct byval. r0 used for return val. >>> >>> So, NumGPRs = 3, VARegSize = 12, VARegSaveSize =16(4 byte offset), >>> access of arg1 is going wrong. >>> >>> For updated test code: >>> >>> struct S114 check114 (*int a*, struct S114 arg0, struct S114* arg1) { >>> >>> ..... >>> } >>> >>> Only r1, r2 used for struct byval, NumGPRs = 2, VARegSize = 8, >>> VARegSaveSize =8, this case works, able to access arg1. >>> >>> >>> Please correct me, if my above understanding is wrong about NumGPRs, >>> VARegSaveSize calculation. >>> >>> >>> Align in computeRegArea is 8 since ABI says the stack pointer needs >>> to be 8 byte aligned at function entry point. >>> But the second argument does not have to be 8 byte aligned, in fact >>> it is 4 byte aligned for i32. >>> Ok. >>> r11, #76 is equivalent to sp_at_entry + 52 since r11 = spat_entry - >>> 16 - 8, which is 4-byte aligned after >>> storing the leftover (67-16=51) bytes of struct byval. >>> For test code, 3 reg used for struct byval, left over will be (67- 12 >>> = 55) copied to stack by caller, 55 sets of ldrb,strb in assembly. >>> Pasting dump again for locating arg1 from sp_at_entry, >>> @entry of check114 >>> sp 0xbefff808 0xbefff808 --> sp_at_entry >>> >>> At arg1 accessing instruction [0xbefff808 + 52] -> >>> [0xbefff83c] ->*0x4071706f* >>> 0xbefff7e4: 0x4001ed08 0x40024f90 0x4071706f 0xbefff8a0 >>> 0xbefff7f4: 0x0000869c 0x00000000 0x3231302f 0x36353433 >>> 0xbefff804: 0x3a393837 0x3e3d3c3b 0x4241403f 0x46454443 >>> 0xbefff814: 0x4a494847 0x4e4d4c4b 0x5251504f 0x56555453 >>> 0xbefff824: 0x5a595857 0x5e5d5c5b 0x6261605f 0x66656463 >>> 0xbefff834: 0x6a696867 0x6e6d6c6b* 0x4071706f* *0x00010861* >>> >>> Can you also paste the assembly for the caller side and check whether >>> the second argument is stored >>> at sp_at_entry+52? >>> Please find the attached assembly file. >>> As Renato suggested, please file a bug report. >>> Filed bug. >>> Thanks, >>> Manman >>> On Jun 18, 2013, at 4:26 AM, Rajesh Viswabramana >>> <rajesh.vis at samsung.com <mailto:rajesh.vis at samsung.com>> wrote: >>> >>>> Hi, >>>> >>>> Handling of pass by val of struct size >64 bytes case is seems wrong >>>> for arm targets. >>>> >>>> >>>> *Summary:* >>>> >>>> Incase of struct pass by value for size > 64 along with other >>>> function params, failure seen in some corner cases. Access to >>>> function params result in wrong stack location access. >>>> >>>> Stack pointer adjustment done by prologue emitter and offset used to >>>> access function params have different logics for calculaton. >>>> >>>> >>>> *Test code >>>> *--------------------------------------------------------------- >>>> #include <stdio.h> >>>> struct S114 { >>>> char a[67]; >>>> }a114[5]; >>>> >>>> struct S114 check114 (struct S114 arg0, struct S114* arg1) { >>>> if(&a114[0] != arg1) // arg1 value is wrong >>>> printf( "values %p, %p\n", &a114[0], arg1); >>>> } >>>> int main () { >>>> int i= 0, j = 0; >>>> for (;j<2; j++) // just filling >>>> a114 with some values for identification >>>> for(i=0; i<sizeof(struct S114); i++) >>>> memset(&a114[j].a[i],(0x11+i+j*30), sizeof(int)); >>>> >>>> check114 (a114[1], &a114[0]); //=> a114[0] is accessed >>>> from wrong location inside check114 function >>>> } >>>> --------------------------------------------------------------- >>>> clang -v >>>> clang version 3.3 (tags/RELEASE_33/final) >>>> Target: i386-pc-linux-gnu >>>> Thread model: posix >>>> >>>> *Output on arm : >>>> *# ./check114.exe >>>> values 0x10861, 0x4071706f >>>> which is wrong. >>>> >>>> Assembly for check114 >>>> --------------------------------------------------------------- >>>> sub sp, sp, #16 >>>> push {r11, lr} >>>> mov r11, sp >>>> sub sp, sp, #8 >>>> str r3, [r11, #20] >>>> str r2, [r11, #16] >>>> str r1, [r11, #12] >>>> ldr r1, [r11, #76] >>>> str r1, [sp, #4] >>>> .loc 1 7 0 prologue_end >>>> ldr r2, .LCPI0_0 >>>> cmp r2, r1 >>>> beq .LBB0_2 >>>> b .LBB0_1 >>>> --------------------------------------------------------------- >>>> >>>> From reg, stack dump: >>>> ------------------------------------------------------------------------------------------------------------------------------ >>>> @entry of check114 >>>> => 0x8398 <check114>: sub sp, sp, #16 >>>> 0x839c <check114+4>: push {r11, lr} >>>> sp 0xbefff808 0xbefff808 >>>> >>>> @if condition >>>> 0x83b4 <check114+28>: ldr r1, [r11, #76] ; 0x4c <--- >>>> wrong value copied to r1, offset #76 should be #80 >>>> 0x83b8 <check114+32>: str r1, [sp, #4] >>>> 0x83bc <check114+36>: ldr r2, [pc, #44] ; 0x83f0 <check114+88> >>>> => 0x83c0 <check114+40>: cmp r2, r1 >>>> >>>> r11 0xbefff7f0 -1090521104 >>>> sp 0xbefff7e8 0xbefff7e8 >>>> >>>> Stack dump: >>>> 0xbefff7e4: 0x4001ed08 0x40024f90 0x4071706f 0xbefff8a0 >>>> 0xbefff7f4: 0x0000869c 0x00000000 0x3231302f 0x36353433 >>>> 0xbefff804: 0x3a393837 0x3e3d3c3b 0x4241403f 0x46454443 >>>> 0xbefff814: 0x4a494847 0x4e4d4c4b 0x5251504f 0x56555453 >>>> 0xbefff824: 0x5a595857 0x5e5d5c5b 0x6261605f 0x66656463 >>>> 0xbefff834: 0x6a696867 0x6e6d6c6b* 0x4071706f* >>>> *0x00010861* //[R11+4c] -> [0xbefff7f0+4c] -> >>>> [0xbefff83c] -> 0x4071706f >>>> >>>> Correct value is at location {[R11+4c]*+4*} --> 0x00010861, 4 bytes >>>> offset going wrong. >>>> ------------------------------------------------------------------------------------------------------------------------------ >>>> >>>> When i checked from the ARM Lowering part for generation of >>>> sub sp, sp, #16 >>>> >>>> Emitted by, >>>> >>>> if (VARegSaveSize) >>>> emitSPUpdate(isARM, MBB, MBBI, dl, TII, -VARegSaveSize, >>>> // --> VARegSaveSize is calculated in computeRegArea >>>> MachineInstr::FrameSetup) >>>> >>>> ARMTargetLowering::computeRegArea(..) { >>>> ... >>>> VARegSize = NumGPRs * 4; >>>> VARegSaveSize = (VARegSize + Align - 1) & ~(Align - >>>> 1); // --> 8 byte alignment done here >>>> } >>>> >>>> Stack pointer decremented to NumGPRs*4 + alignment >>>> >>>> NumGPRs = 3 registers >>>> >>>> VARegSaveSize = 16 (after considering 8 byte alignment ) >>>> >>>> >>>> When the offset(#76) for the instruction, "ldr r1, [r11, #76] ; >>>> 0x4c" is calculated, 4 bytes alignment is considered. >>>> In prologue stackpointer calculation 8 byte alignment is considered. >>>> >>>> Due to this mimatch of alignment, If try to access any parameter >>>> after byval which results wrong value. >>>> >>>> Issue(or offset of 4 bytes) wont occur if even number of register >>>> used for byval spilling. >>>> ex: >>>> struct S114 check114 (int a, struct S114 arg0, struct S114* arg1) { >>>> // accessing arg1 is fine in this case >>>> ..... >>>> } >>>> >>>> Could someone comment on below queries about fixing the problem, >>>> >>>> 1) Is this 8 byte alignment mandatory ? Is this due to " ARM AAPCS >>>> 5.2.1.2 Stack constraints at a public interface" ? Can this be removed? >>>> >>>> 2) We will leave alignment as it is but in prologue we will adjust >>>> SP once again, this is little meaningless. >>>> >>>> 3) While accessing arg1 we will consider alignment and add extra >>>> offset -> looks better. >>>> >>>> Offset to access arg1 is calculated by selection DAG that will be >>>> target independent. But Alignment adjustment should be done by >>>> target lowering. Any suggestions on how to fix this ? >>>> >>>> Regards, >>>> Rajesh >>>> >>>> <201306181656803_BEI0XT4N.gif> >>>> >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> LLVMdev at cs.uiuc.edu >>>> <mailto:LLVMdev at cs.uiuc.edu>http://llvm.cs.uiuc.edu >>>> <http://llvm.cs.uiuc.edu/> >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>> , >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu >>> <mailto:LLVMdev at cs.uiuc.edu>http://llvm.cs.uiuc.edu >>> <http://llvm.cs.uiuc.edu/> >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>> >> >> >> >> >> >> >> <201306191201558_BEI0XT4N.gif> >> >> <check114.s> >
Reasonably Related Threads
- [LLVMdev] ARM struct byval size > 64 triggers failure
- [LLVMdev] ARM struct byval size > 64 triggers failure
- [LLVMdev] ARM struct byval size > 64 triggers failure
- [LLVMdev] Properly handling mem-loc arguments when prologue adjusts FP.
- gapped sequence data summary