Benoit Belley via llvm-dev
2015-Sep-08 14:21 UTC
[llvm-dev] LLVM struct, alloca, SROA and the entry basic block
Hi everyone, We have noticed that the SROA pass will only eliminate 'alloca' instructions if those are located in the entry basic block of a function. As a general recommendation, should the LLVM IR emitted by our compiler always place 'alloca' instructions in the entry basic block ? (I couldn't find any recommendations concerning this matter.) In addition, we have noticed that the MemCpy pass will attempt to copy LLVM struct using moves that are as large as possible. For example, a struct of 3 floats is copied using a 64-bit and a 32-bit move. It is therefore important that such a struct be aligned on 8-byte boundary, not just 4 bytes! Else, one runs the risk of triggering store-forwarding failure pipelining stalls (which we did encountered really badly with one of our internal performance benchmark). Is there any guidelines for specifying the alignment of LLVM structs allocated by alloca instructions ? Is rounding down to the structure size to the next power of 2 a good strategy ? Will the MemCpy pass issue moves of up to 64-bytes on AVX-512 capable processors ? Cheers, Benoit Benoit Belley Sr Principal Developer M&E-Product Development Group MAIN +1 514 393 1616 DIRECT +1 438 448 6304 FAX +1 514 393 0110 Twitter<http://twitter.com/autodesk> Facebook<https://www.facebook.com/Autodesk> Autodesk, Inc. 10 Duke Street Montreal, Quebec, Canada H3C 2L7 www.autodesk.com<http://www.autodesk.com/> [Description: Email_Signature_Logobar] -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150908/4f2eb622/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: 350F40DB-4457-4455-A632-0DF05738AF15[15].png Type: image/png Size: 4316 bytes Desc: 350F40DB-4457-4455-A632-0DF05738AF15[15].png URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150908/4f2eb622/attachment.png>
Philip Reames via llvm-dev
2015-Sep-08 16:50 UTC
[llvm-dev] LLVM struct, alloca, SROA and the entry basic block
On 09/08/2015 07:21 AM, Benoit Belley via llvm-dev wrote:> Hi everyone, > > We have noticed that the SROA pass will only eliminate ‘alloca’ > instructions if those are located in the entry basic block of a function. > > /*As a general recommendation, should the LLVM IR emitted by our > compiler always place ‘alloca’ instructions in the entry basic block ? > (I couldn’t find any recommendations concerning this matter.)*/Yes. /* */> / > / > In addition, we have noticed that the MemCpy pass will attempt to copy > LLVM struct using moves that are as large as possible. For example, a > struct of 3 floats is copied using a 64-bit and a 32-bit move. It is > therefore important that such a struct be aligned on 8-byte boundary, > not just 4 bytes! Else, one runs the risk of triggering > store-forwarding failure pipelining stalls (which we did encountered > really badly with one of our internal performance benchmark).This sounds like a bug to me. We shouldn't be using the large load/stores without knowing they're aligned or that unaligned access is fast on a particular target. Where this is best fixed (memcpy, store lowering?) I don't know.> > */Is there any guidelines for specifying the alignment of LLVM structs > allocated by alloca instructions ? Is rounding down to the structure > size to the next power of 2 a good strategy ? Will the MemCpy pass > issue moves of up to 64-bytes on AVX-512 capable processors ?/* > */ > /* > Cheers, > Benoit// > > *Benoit Belley* > > Sr Principal Developer > > M&E-Product Development Group > > *MAIN* +1 514 393 1616 > > *DIRECT* +1 438 448 6304 > > *FAX* +1 514 393 0110 > > Twitter <http://twitter.com/autodesk> > > Facebook <https://www.facebook.com/Autodesk> > > *Autodesk, Inc.* > > 10 Duke Street > > Montreal, Quebec, Canada H3C 2L7 > > www.autodesk.com <http://www.autodesk.com/> > > Description: Email_Signature_Logobar > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150908/d713e09c/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 4316 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150908/d713e09c/attachment.png>
Benoit Belley via llvm-dev
2015-Sep-08 17:11 UTC
[llvm-dev] LLVM struct, alloca, SROA and the entry basic block
From: Philip Reames <listmail at philipreames.com<mailto:listmail at philipreames.com>> Date: mardi 8 septembre 2015 12:50 To: Benoit Belley <benoit.belley at autodesk.com<mailto:benoit.belley at autodesk.com>>, "llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>" <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> Subject: Re: [llvm-dev] LLVM struct, alloca, SROA and the entry basic block On 09/08/2015 07:21 AM, Benoit Belley via llvm-dev wrote: Hi everyone, We have noticed that the SROA pass will only eliminate 'alloca' instructions if those are located in the entry basic block of a function. As a general recommendation, should the LLVM IR emitted by our compiler always place 'alloca' instructions in the entry basic block ? (I couldn't find any recommendations concerning this matter.) Yes. Thanks Phil. Should this be mentioned somewhere in the documentation ? As a footnote in the LLVM Language Reference manual maybe ? As a note, I have also find out that alloca instructions should be placed before any call instructions as these can get inlined and then, the original alloca can no longer by placed in the entry basic block! In addition, we have noticed that the MemCpy pass will attempt to copy LLVM struct using moves that are as large as possible. For example, a struct of 3 floats is copied using a 64-bit and a 32-bit move. It is therefore important that such a struct be aligned on 8-byte boundary, not just 4 bytes! Else, one runs the risk of triggering store-forwarding failure pipelining stalls (which we did encountered really badly with one of our internal performance benchmark). This sounds like a bug to me. We shouldn't be using the large load/stores without knowing they're aligned or that unaligned access is fast on a particular target. Where this is best fixed (memcpy, store lowering?) I don't know. I'll send out a test case. Maybe, that will help. Is there any guidelines for specifying the alignment of LLVM structs allocated by alloca instructions ? Is rounding down to the structure size to the next power of 2 a good strategy ? Will the MemCpy pass issue moves of up to 64-bytes on AVX-512 capable processors ? Cheers, Benoit Benoit Belley Sr Principal Developer M&E-Product Development Group MAIN +1 514 393 1616 DIRECT +1 438 448 6304 FAX +1 514 393 0110 Twitter<http://twitter.com/autodesk> Facebook<https://www.facebook.com/Autodesk> Autodesk, Inc. 10 Duke Street Montreal, Quebec, Canada H3C 2L7 www.autodesk.com<http://www.autodesk.com/> [Description: Email_Signature_Logobar] _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150908/0655709e/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00001.png Type: image/png Size: 4316 bytes Desc: ATT00001.png URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150908/0655709e/attachment.png>
Possibly Parallel Threads
- LLVM struct, alloca, SROA and the entry basic block
- TargetTriple issue: LC_VERSION_MIN_MACOSX: Darwin kernel version vs SDK version
- [LLVMdev] Optimization puzzle...
- JIT, LTO and @llvm.global_ctors: Looking for advise
- JIT, LTO and @llvm.global_ctors: Looking for advise