Thanks Bruce.> On 4 May 2015, at 13:18, Bruce Hoult <bruce at hoult.org> wrote: > > I can confirm that, with Apple LLVM version 6.0 (clang-600.0.56) (based on LLVM 3.5svn) > > Very strange!Yes, that’s what I thought. I’ve also checked the binary downloads for OS X from llvm.org <http://llvm.org/> and get the same broken code from both the 3.5.2 and 3.6.0 releases. -arch arm64 wasn’t supported on the 3.4 branch, so wasn’t able to test there, but I imagine this has been there from the start of the arm64 backend.> I don't know if it helps you, but this looks correct:My guess was that as the result elements are returned in registers there might be a bug somewhere around the final element being treated as unused and optimized away, but I’d expect your code to trigger the same issue in that case. My actual code uses a Vector class templated on the size of the vector so your approach so your approach doesn’t really help there. I’m able to work around it easily enough though - as soon as the function does any actual work with the value it seems to work OK. It is quite a worrying bug for me though - there are quite a lot of functions in my codebase that return small (length 2 or 3) Vector types - generally in headers so they can be inlined an optimized well - but I’d definitely like to understand the root cause of this one so I can be on the lookout for any other similar failures. Simon> > typedef struct {float v0, v1;} vec2; > typedef struct {float v0, v1, v2;} vec3; > > vec2 getVec2(); > > vec3 getVec3() > { > vec2 myVec = getVec2(); > > vec3 res; > res.v0 = myVec.v0; > res.v1 = myVec.v1; > res.v2 = 1; > return res; > } > > .section __TEXT,__text,regular,pure_instructions > .globl _getVec3 > .align 2 > _getVec3: ; @getVec3 > ; BB#0: > stp fp, lr, [sp, #-16]! > mov fp, sp > bl _getVec2 > fmov s2, #1.000000e+00 > ldp fp, lr, [sp], #16 > ret > > > On Mon, May 4, 2015 at 1:19 PM, Simon Taylor <simontaylor1 at ntlworld.com <mailto:simontaylor1 at ntlworld.com>> wrote: > Hi all, > > I’ve narrowed down a problem in my code to the following test case: > > - - - - > > typedef struct {float v[2];} vec2; > typedef struct {float v[3];} vec3; > > vec2 getVec2(); > > vec3 getVec3() > { > vec2 myVec = getVec2(); > > vec3 res; > res.v[0] = myVec.v[0]; > res.v[1] = myVec.v[1]; > res.v[2] = 1; > return res; > } > > - - - - > > Compiling this with any level of optimization for arm64 gives incorrect code, unless my test case above is triggering some undefined behaviour that I’m not aware of. Other architectures appear to work OK. > > $ clang -arch arm64 -O1 test.c -S -o - > .section __TEXT,__text,regular,pure_instructions > .ios_version_min 5, 0 > .globl _getVec3 > .align 2 > _getVec3: ; @getVec3 > ; BB#0: > b _getVec2 > > > .subsections_via_symbols > > - - - - > > I’m happy to file a bug for this, but not sure quite where it belongs - clang, LLVM or direct to Apple. Can someone test the top of tree and see if it suffers the same issue? > > I’m currently using the latest Xcode from Apple (clang -v gives “Apple LLVM version 6.1.0 (clang-602.0.49) (based on LLVM 3.6.0svn)”) but the LLVM 3.5-based compiler in the previous Xcode release also generates the same code. > > > Simon > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu> http://llvm.cs.uiuc.edu <http://llvm.cs.uiuc.edu/> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150504/4d259b5d/attachment.html>
I’ve gone ahead and created a bug for this, as it seems to be a genuine issue rather than me just overlooking something obvious. https://llvm.org/bugs/show_bug.cgi?id=23408 <https://llvm.org/bugs/show_bug.cgi?id=23408> Simon> On 4 May 2015, at 13:51, Simon Taylor <simontaylor1 at ntlworld.com> wrote: > > Thanks Bruce. > >> On 4 May 2015, at 13:18, Bruce Hoult <bruce at hoult.org <mailto:bruce at hoult.org>> wrote: >> >> I can confirm that, with Apple LLVM version 6.0 (clang-600.0.56) (based on LLVM 3.5svn) >> >> Very strange! > > Yes, that’s what I thought. I’ve also checked the binary downloads for OS X from llvm.org <http://llvm.org/> and get the same broken code from both the 3.5.2 and 3.6.0 releases. -arch arm64 wasn’t supported on the 3.4 branch, so wasn’t able to test there, but I imagine this has been there from the start of the arm64 backend. > >> I don't know if it helps you, but this looks correct: > > My guess was that as the result elements are returned in registers there might be a bug somewhere around the final element being treated as unused and optimized away, but I’d expect your code to trigger the same issue in that case. My actual code uses a Vector class templated on the size of the vector so your approach so your approach doesn’t really help there. I’m able to work around it easily enough though - as soon as the function does any actual work with the value it seems to work OK. > > It is quite a worrying bug for me though - there are quite a lot of functions in my codebase that return small (length 2 or 3) Vector types - generally in headers so they can be inlined an optimized well - but I’d definitely like to understand the root cause of this one so I can be on the lookout for any other similar failures. > > Simon > >> >> typedef struct {float v0, v1;} vec2; >> typedef struct {float v0, v1, v2;} vec3; >> >> vec2 getVec2(); >> >> vec3 getVec3() >> { >> vec2 myVec = getVec2(); >> >> vec3 res; >> res.v0 = myVec.v0; >> res.v1 = myVec.v1; >> res.v2 = 1; >> return res; >> } >> >> .section __TEXT,__text,regular,pure_instructions >> .globl _getVec3 >> .align 2 >> _getVec3: ; @getVec3 >> ; BB#0: >> stp fp, lr, [sp, #-16]! >> mov fp, sp >> bl _getVec2 >> fmov s2, #1.000000e+00 >> ldp fp, lr, [sp], #16 >> ret >> >> >> On Mon, May 4, 2015 at 1:19 PM, Simon Taylor <simontaylor1 at ntlworld.com <mailto:simontaylor1 at ntlworld.com>> wrote: >> Hi all, >> >> I’ve narrowed down a problem in my code to the following test case: >> >> - - - - >> >> typedef struct {float v[2];} vec2; >> typedef struct {float v[3];} vec3; >> >> vec2 getVec2(); >> >> vec3 getVec3() >> { >> vec2 myVec = getVec2(); >> >> vec3 res; >> res.v[0] = myVec.v[0]; >> res.v[1] = myVec.v[1]; >> res.v[2] = 1; >> return res; >> } >> >> - - - - >> >> Compiling this with any level of optimization for arm64 gives incorrect code, unless my test case above is triggering some undefined behaviour that I’m not aware of. Other architectures appear to work OK. >> >> $ clang -arch arm64 -O1 test.c -S -o - >> .section __TEXT,__text,regular,pure_instructions >> .ios_version_min 5, 0 >> .globl _getVec3 >> .align 2 >> _getVec3: ; @getVec3 >> ; BB#0: >> b _getVec2 >> >> >> .subsections_via_symbols >> >> - - - - >> >> I’m happy to file a bug for this, but not sure quite where it belongs - clang, LLVM or direct to Apple. Can someone test the top of tree and see if it suffers the same issue? >> >> I’m currently using the latest Xcode from Apple (clang -v gives “Apple LLVM version 6.1.0 (clang-602.0.49) (based on LLVM 3.6.0svn)”) but the LLVM 3.5-based compiler in the previous Xcode release also generates the same code. >> >> >> Simon >> >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu> http://llvm.cs.uiuc.edu <http://llvm.cs.uiuc.edu/> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150504/298ccabb/attachment.html>
On 4 May 2015 at 10:37, Simon Taylor <simontaylor1 at ntlworld.com> wrote:> I’ve gone ahead and created a bug for this, as it seems to be a genuine > issue rather than me just overlooking something obvious. > https://llvm.org/bugs/show_bug.cgi?id=23408Thanks for taking the trouble to report this Simon. I think I've found and fixed the issue now (r236457). It should make its way into Xcode's clang without too much delay. Cheers. Tim.
Reasonably Related Threads
- [LLVMdev] Incorrect code generated for arm64
- [LLVMdev] Incorrect code generated for arm64
- [LLVMdev] NEON intrinsics preventing redundant load optimization?
- [LLVMdev] NEON intrinsics preventing redundant load optimization?
- [LLVMdev] NEON intrinsics preventing redundant load optimization?