thr3ads.net - llvm dev - [LLVMdev] First class aggregates of small size: split when used in function call [Dec 2014]

If this information is useful, please help other people find it:
Share via:

Virgile Bello

2014-Dec-31 07:41 UTC

[LLVMdev] First class aggregates of small size: split when used in function call

Hello,

In my LLVM frontend (CLR/MSIL), I am currently using first-class aggregates
to represent loaded value types on the "CLR stack".

However, I noticed that when calling external method taking those aggregate
by value, they were not passed as I expected:

%COLORREF = type { i8, i8, i8, i8 }

declare i32 @SetLayeredWindowAttributes(i8*, %COLORREF, i8, i32)
I call this function with call x86_stdcallcc (it's a Win32 function, loaded
with GetProcAddress)

However, checking the assembly code, it seems that the %COLORREF gets split
due to the calling convention: first i8 field go through %edx, but the 3
next fields go through the stacks.
I would like all of it to go through either a single 32bit register or a
32bit stack value (since all of the structure fits in a i32 and it is
already packed in memory that way before the call).

I was thinking using alloca with sret/byval might help, but I am not even
sure since it is enough, since clang also seems to actually use i16 or i32
(and even i32+i16 or i32+i32) to represent such struct <= 8 bytes  when
passing them to a method (even if they contain many smaller i8 fields).

Does somebody know if only alloca with sret/byval is enough or if I also
need to concat myself smaller struct into i32 types like clang does to be
sure it won't be split across registers?
Any other hint or idea on how I can achieve this?

Also, I was wondering which is the current recommendation (sret/byval with
alloca for every copy vs first-class aggregate) considering the current
state of LLVM and supported optimizations. Since clang uses sret/byval, I
expect it to be more optimized/mature, but I might be wrong.

I suppose LLVM will easily understand/optimize all those additional
aggregate alloca/memcpy I will end up doing if I were to switch to a
sret/byval approach?

Thanks,
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20141231/2f3b0a87/attachment.html>

Reid Kleckner

2014-Dec-31 18:05 UTC

head link

[LLVMdev] First class aggregates of small size: split when used in function call

The current recommendation for matching external C ABIs is actually "use
Clang", as in literally link it into your program and try to leverage the
methods of clang::CodeGen::CodeGenFunction. This is the only way to get the
lowering 100% correct, but it's a lot of work, so depending on your needs,
you may want to roll your own lowering of high-level function prototype to
LLVM function prototype.

If you roll your own, then LLVM generally passes FCAs as though they were
split into constituent elements. I think ARM does something different here,
though. :(

Clang sometimes uses integers of appropriate size on non-x86 architectures
to try to model the usage of an integer register to pass the whole struct.
For a small struct, this is good because element accesses can be
transformed into shifts, masks, and truncs.

If you just want to match MSVC's C ABI, byval is probably the way to go.
LLVM is not very good at optimizing it, but it will do the right thing.
Splitting into i32-sized chunks would also work.

On Tue, Dec 30, 2014 at 11:41 PM, Virgile Bello <virgile.bello at
gmail.com>
wrote:
> Hello,
>
> In my LLVM frontend (CLR/MSIL), I am currently using first-class
> aggregates to represent loaded value types on the "CLR stack".
>
> However, I noticed that when calling external method taking those
> aggregate by value, they were not passed as I expected:
>
> %COLORREF = type { i8, i8, i8, i8 }
>
> declare i32 @SetLayeredWindowAttributes(i8*, %COLORREF, i8, i32)
> I call this function with call x86_stdcallcc (it's a Win32 function,
> loaded with GetProcAddress)
>
> However, checking the assembly code, it seems that the %COLORREF gets
> split due to the calling convention: first i8 field go through %edx, but
> the 3 next fields go through the stacks.
> I would like all of it to go through either a single 32bit register or a
> 32bit stack value (since all of the structure fits in a i32 and it is
> already packed in memory that way before the call).
>
> I was thinking using alloca with sret/byval might help, but I am not even
> sure since it is enough, since clang also seems to actually use i16 or i32
> (and even i32+i16 or i32+i32) to represent such struct <= 8 bytes  when
> passing them to a method (even if they contain many smaller i8 fields).
>
> Does somebody know if only alloca with sret/byval is enough or if I also
> need to concat myself smaller struct into i32 types like clang does to be
> sure it won't be split across registers?
> Any other hint or idea on how I can achieve this?
>
> Also, I was wondering which is the current recommendation (sret/byval with
> alloca for every copy vs first-class aggregate) considering the current
> state of LLVM and supported optimizations. Since clang uses sret/byval, I
> expect it to be more optimized/mature, but I might be wrong.
>
> I suppose LLVM will easily understand/optimize all those additional
> aggregate alloca/memcpy I will end up doing if I were to switch to a
> sret/byval approach?
>
> Thanks,
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20141231/ca570fa4/attachment.html>

Reid Kleckner

2014-Dec-31 18:10 UTC

head link

[LLVMdev] First class aggregates of small size: split when used in function call

On Wed, Dec 31, 2014 at 10:05 AM, Reid Kleckner <rnk at google.com> wrote:
> The current recommendation for matching external C ABIs is actually
"use
> Clang", as in literally link it into your program and try to leverage
the
> methods of clang::CodeGen::CodeGenFunction. This is the only way to get the
> lowering 100% correct, but it's a lot of work, so depending on your
needs,
> you may want to roll your own lowering of high-level function prototype to
> LLVM function prototype.
>
I should mention this was the topic of a talk by John McCall at the LLVM
dev meeting:
http://llvm.org/devmtg/2014-10/Slides/Skip%20the%20FFI.pdf

See slides starting around #101 for why this is hard and the many different
ways of lowering this C function prototype into LLVM IR:

typedef struct {
 float x, y;
} Point2f;
Point2f flipOverXAxis(Point2f point) {
 // ...
}
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20141231/7579cc05/attachment.html>

lost

2015-Jan-11 22:30 UTC

head link

[LLVMdev] First class aggregates of small size: split when used in function call

Hi Virgile,

You might be interested in the interop code I wrote for my LLVM.NET
binding:
https://bitbucket.org/lost/llvm.net/src/d8014b07723c69571e188a453ab39c764252985c/LLVM/Interop/?at=default

- Victor

2014-12-30 23:41 GMT-08:00 Virgile Bello <virgile.bello at gmail.com>:
> Hello,
>
> In my LLVM frontend (CLR/MSIL), I am currently using first-class
> aggregates to represent loaded value types on the "CLR stack".
>
> However, I noticed that when calling external method taking those
> aggregate by value, they were not passed as I expected:
>
> %COLORREF = type { i8, i8, i8, i8 }
>
> declare i32 @SetLayeredWindowAttributes(i8*, %COLORREF, i8, i32)
> I call this function with call x86_stdcallcc (it's a Win32 function,
> loaded with GetProcAddress)
>
> However, checking the assembly code, it seems that the %COLORREF gets
> split due to the calling convention: first i8 field go through %edx, but
> the 3 next fields go through the stacks.
> I would like all of it to go through either a single 32bit register or a
> 32bit stack value (since all of the structure fits in a i32 and it is
> already packed in memory that way before the call).
>
> I was thinking using alloca with sret/byval might help, but I am not even
> sure since it is enough, since clang also seems to actually use i16 or i32
> (and even i32+i16 or i32+i32) to represent such struct <= 8 bytes  when
> passing them to a method (even if they contain many smaller i8 fields).
>
> Does somebody know if only alloca with sret/byval is enough or if I also
> need to concat myself smaller struct into i32 types like clang does to be
> sure it won't be split across registers?
> Any other hint or idea on how I can achieve this?
>
> Also, I was wondering which is the current recommendation (sret/byval with
> alloca for every copy vs first-class aggregate) considering the current
> state of LLVM and supported optimizations. Since clang uses sret/byval, I
> expect it to be more optimized/mature, but I might be wrong.
>
> I suppose LLVM will easily understand/optimize all those additional
> aggregate alloca/memcpy I will end up doing if I were to switch to a
> sret/byval approach?
>
> Thanks,
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150111/d519ba6a/attachment.html>

Reasonably Related Threads

Search for more seemingly similar threads

llvm dev - Dec 2014 - [LLVMdev] First class aggregates of small size: split when used in function call

[LLVMdev] First class aggregates of small size: split when used in function call

[LLVMdev] First class aggregates of small size: split when used in function call

[LLVMdev] First class aggregates of small size: split when used in function call

[LLVMdev] First class aggregates of small size: split when used in function call

Reasonably Related Threads