Virgile Bello
2014-Dec-31 07:41 UTC
[LLVMdev] First class aggregates of small size: split when used in function call
Hello, In my LLVM frontend (CLR/MSIL), I am currently using first-class aggregates to represent loaded value types on the "CLR stack". However, I noticed that when calling external method taking those aggregate by value, they were not passed as I expected: %COLORREF = type { i8, i8, i8, i8 } declare i32 @SetLayeredWindowAttributes(i8*, %COLORREF, i8, i32) I call this function with call x86_stdcallcc (it's a Win32 function, loaded with GetProcAddress) However, checking the assembly code, it seems that the %COLORREF gets split due to the calling convention: first i8 field go through %edx, but the 3 next fields go through the stacks. I would like all of it to go through either a single 32bit register or a 32bit stack value (since all of the structure fits in a i32 and it is already packed in memory that way before the call). I was thinking using alloca with sret/byval might help, but I am not even sure since it is enough, since clang also seems to actually use i16 or i32 (and even i32+i16 or i32+i32) to represent such struct <= 8 bytes when passing them to a method (even if they contain many smaller i8 fields). Does somebody know if only alloca with sret/byval is enough or if I also need to concat myself smaller struct into i32 types like clang does to be sure it won't be split across registers? Any other hint or idea on how I can achieve this? Also, I was wondering which is the current recommendation (sret/byval with alloca for every copy vs first-class aggregate) considering the current state of LLVM and supported optimizations. Since clang uses sret/byval, I expect it to be more optimized/mature, but I might be wrong. I suppose LLVM will easily understand/optimize all those additional aggregate alloca/memcpy I will end up doing if I were to switch to a sret/byval approach? Thanks, -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141231/2f3b0a87/attachment.html>
Reid Kleckner
2014-Dec-31 18:05 UTC
[LLVMdev] First class aggregates of small size: split when used in function call
The current recommendation for matching external C ABIs is actually "use Clang", as in literally link it into your program and try to leverage the methods of clang::CodeGen::CodeGenFunction. This is the only way to get the lowering 100% correct, but it's a lot of work, so depending on your needs, you may want to roll your own lowering of high-level function prototype to LLVM function prototype. If you roll your own, then LLVM generally passes FCAs as though they were split into constituent elements. I think ARM does something different here, though. :( Clang sometimes uses integers of appropriate size on non-x86 architectures to try to model the usage of an integer register to pass the whole struct. For a small struct, this is good because element accesses can be transformed into shifts, masks, and truncs. If you just want to match MSVC's C ABI, byval is probably the way to go. LLVM is not very good at optimizing it, but it will do the right thing. Splitting into i32-sized chunks would also work. On Tue, Dec 30, 2014 at 11:41 PM, Virgile Bello <virgile.bello at gmail.com> wrote:> Hello, > > In my LLVM frontend (CLR/MSIL), I am currently using first-class > aggregates to represent loaded value types on the "CLR stack". > > However, I noticed that when calling external method taking those > aggregate by value, they were not passed as I expected: > > %COLORREF = type { i8, i8, i8, i8 } > > declare i32 @SetLayeredWindowAttributes(i8*, %COLORREF, i8, i32) > I call this function with call x86_stdcallcc (it's a Win32 function, > loaded with GetProcAddress) > > However, checking the assembly code, it seems that the %COLORREF gets > split due to the calling convention: first i8 field go through %edx, but > the 3 next fields go through the stacks. > I would like all of it to go through either a single 32bit register or a > 32bit stack value (since all of the structure fits in a i32 and it is > already packed in memory that way before the call). > > I was thinking using alloca with sret/byval might help, but I am not even > sure since it is enough, since clang also seems to actually use i16 or i32 > (and even i32+i16 or i32+i32) to represent such struct <= 8 bytes when > passing them to a method (even if they contain many smaller i8 fields). > > Does somebody know if only alloca with sret/byval is enough or if I also > need to concat myself smaller struct into i32 types like clang does to be > sure it won't be split across registers? > Any other hint or idea on how I can achieve this? > > Also, I was wondering which is the current recommendation (sret/byval with > alloca for every copy vs first-class aggregate) considering the current > state of LLVM and supported optimizations. Since clang uses sret/byval, I > expect it to be more optimized/mature, but I might be wrong. > > I suppose LLVM will easily understand/optimize all those additional > aggregate alloca/memcpy I will end up doing if I were to switch to a > sret/byval approach? > > Thanks, > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141231/ca570fa4/attachment.html>
Reid Kleckner
2014-Dec-31 18:10 UTC
[LLVMdev] First class aggregates of small size: split when used in function call
On Wed, Dec 31, 2014 at 10:05 AM, Reid Kleckner <rnk at google.com> wrote:> The current recommendation for matching external C ABIs is actually "use > Clang", as in literally link it into your program and try to leverage the > methods of clang::CodeGen::CodeGenFunction. This is the only way to get the > lowering 100% correct, but it's a lot of work, so depending on your needs, > you may want to roll your own lowering of high-level function prototype to > LLVM function prototype. >I should mention this was the topic of a talk by John McCall at the LLVM dev meeting: http://llvm.org/devmtg/2014-10/Slides/Skip%20the%20FFI.pdf See slides starting around #101 for why this is hard and the many different ways of lowering this C function prototype into LLVM IR: typedef struct { float x, y; } Point2f; Point2f flipOverXAxis(Point2f point) { // ... } -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141231/7579cc05/attachment.html>
lost
2015-Jan-11 22:30 UTC
[LLVMdev] First class aggregates of small size: split when used in function call
Hi Virgile, You might be interested in the interop code I wrote for my LLVM.NET binding: https://bitbucket.org/lost/llvm.net/src/d8014b07723c69571e188a453ab39c764252985c/LLVM/Interop/?at=default - Victor 2014-12-30 23:41 GMT-08:00 Virgile Bello <virgile.bello at gmail.com>:> Hello, > > In my LLVM frontend (CLR/MSIL), I am currently using first-class > aggregates to represent loaded value types on the "CLR stack". > > However, I noticed that when calling external method taking those > aggregate by value, they were not passed as I expected: > > %COLORREF = type { i8, i8, i8, i8 } > > declare i32 @SetLayeredWindowAttributes(i8*, %COLORREF, i8, i32) > I call this function with call x86_stdcallcc (it's a Win32 function, > loaded with GetProcAddress) > > However, checking the assembly code, it seems that the %COLORREF gets > split due to the calling convention: first i8 field go through %edx, but > the 3 next fields go through the stacks. > I would like all of it to go through either a single 32bit register or a > 32bit stack value (since all of the structure fits in a i32 and it is > already packed in memory that way before the call). > > I was thinking using alloca with sret/byval might help, but I am not even > sure since it is enough, since clang also seems to actually use i16 or i32 > (and even i32+i16 or i32+i32) to represent such struct <= 8 bytes when > passing them to a method (even if they contain many smaller i8 fields). > > Does somebody know if only alloca with sret/byval is enough or if I also > need to concat myself smaller struct into i32 types like clang does to be > sure it won't be split across registers? > Any other hint or idea on how I can achieve this? > > Also, I was wondering which is the current recommendation (sret/byval with > alloca for every copy vs first-class aggregate) considering the current > state of LLVM and supported optimizations. Since clang uses sret/byval, I > expect it to be more optimized/mature, but I might be wrong. > > I suppose LLVM will easily understand/optimize all those additional > aggregate alloca/memcpy I will end up doing if I were to switch to a > sret/byval approach? > > Thanks, > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150111/d519ba6a/attachment.html>