thr3ads.net - llvm dev - [LLVMdev] Passing and returning aggregates (who is responsible for the ABI?) [Nov 2007]

If this information is useful, please help other people find it:
Share via:

Christophe de Dinechin

2007-Nov-06 00:19 UTC

[LLVMdev] Passing and returning aggregates (who is responsible for the ABI?)

Hello,


I'm trying to port the XL compiler (http://xlr.sf.net) to use the  
LLVM back-end. So far, little trouble doing so. But there is one  
aspect of the semantics of the LLVM IR that surprises me. Why are the  
call, declare and define "halfway through" ABI conventions?

I think it's the right thing to have a single high level node for  
each call, as opposed to separate instructions for pushing individual  
argument, for example. But that implies that the call semantics  
include a good dose of ABI and calling conventions. This is explicit  
in the fact that you tell what the calling conventions are (e.g ccc,  
fastcc).

But then, why refuse aggregates as input or output of a call? What is  
the rationale? On x86, I think it does not make any difference. But  
for Itanium, it's clearly broken (e.g. Itanium can return a struct of  
up to 4 ints in registers, and packs input parameters in a "funny"  
way). Languages such as Ada or XL have output parameters, and they  
are similarly difficult to generate code for (you have to make it  
look like C).

I don't think adding aggregate support would break any current IR  
producer, and assuming the aggregates are expanded early on, it  
probably has very localized impact in the code. Are there other good  
reasons not to add this capability, or would a patch adding it stand  
a good chance to be accepted?


Thanks
Christophe

Gordon Henriksen

2007-Nov-06 00:35 UTC

head link

[LLVMdev] Passing and returning aggregates (who is responsible for the ABI?)

On Nov 5, 2007, at 19:19, Christophe de Dinechin wrote:
> I'm trying to port the XL compiler (http://xlr.sf.net) to use the  
> LLVM back-end. So far, little trouble doing so. But there is one  
> aspect of the semantics of the LLVM IR that surprises me. Why are  
> the call, declare and define "halfway through" ABI conventions?
>
> I think it's the right thing to have a single high level node for  
> each call, as opposed to separate instructions for pushing  
> individual argument, for example. But that implies that the call  
> semantics include a good dose of ABI and calling conventions. This  
> is explicit in the fact that you tell what the calling conventions  
> are (e.g ccc, fastcc).
>
> But then, why refuse aggregates as input or output of a call? What  
> is the rationale?
Probably in good part because, in LLVM, aggregates (or derived types)  
types exist only in memory, not in registers.
> On x86, I think it does not make any difference. But for Itanium,  
> it's clearly broken (e.g. Itanium can return a struct of up to 4  
> ints in registers, and packs input parameters in a "funny" way).
> Languages such as Ada or XL have output parameters, and they are  
> similarly difficult to generate code for (you have to make it look  
> like C).
>
> I don't think adding aggregate support would break any current IR  
> producer, and assuming the aggregates are expanded early on, it  
> probably has very localized impact in the code. Are there other good  
> reasons not to add this capability, or would a patch adding it stand  
> a good chance to be accepted?
Chris has some notes about how to do this for return values here:

http://www.nondot.org/sabre/LLVMNotes/MultipleReturnValues.txt

— Gordon

Chris Lattner

2007-Nov-06 05:17 UTC

head link

[LLVMdev] Passing and returning aggregates (who is responsible for the ABI?)

> I'm trying to port the XL compiler (http://xlr.sf.net) to use the
> LLVM back-end. So far, little trouble doing so. But there is one
> aspect of the semantics of the LLVM IR that surprises me. Why are the
> call, declare and define "halfway through" ABI conventions?
Hrm?
> I think it's the right thing to have a single high level node for
> each call, as opposed to separate instructions for pushing individual
> argument, for example. But that implies that the call semantics
> include a good dose of ABI and calling conventions. This is explicit
> in the fact that you tell what the calling conventions are (e.g ccc,
> fastcc).
Right.
> But then, why refuse aggregates as input or output of a call? What is
> the rationale?
Because LLVM has no notion of aggregates as "values" that can be  
passed around as atomic units.  This is a very important design point,  
and has many useful values.
> On x86, I think it does not make any difference. But
> for Itanium, it's clearly broken (e.g. Itanium can return a struct of
> up to 4 ints in registers, and packs input parameters in a
"funny"
> way). Languages such as Ada or XL have output parameters, and they
> are similarly difficult to generate code for (you have to make it
> look like C).
>
> I don't think adding aggregate support would break any current IR
> producer, and assuming the aggregates are expanded early on, it
> probably has very localized impact in the code. Are there other good
> reasons not to add this capability, or would a patch adding it stand
> a good chance to be accepted?
Unfortunately, this wouldn't solve the problem that you think it  
does.  For example, lets assume that LLVM allowed you to pass and  
return structs by value.  Even with this, LLVM would not be able to  
directly implement all ABIs "naturally".  For example, some ABIs  
specify that a _Complex double should be returned in two FP registers,  
but that a struct with two doubles in it should be returned in memory.

By the time you lower to LLVM, all you have is {double,double}.  In  
fact, there is no way, in general, to retain all the high level  
information in LLVM without flavoring the LLVM IR with target info.

-Chris

Christophe de Dinechin

2007-Nov-06 07:07 UTC

head link

[LLVMdev] Passing and returning aggregates (who is responsible for the ABI?)

On 6 nov. 07, at 06:17, Chris Lattner wrote:
>> But then, why refuse aggregates as input or output of a call? What is
>> the rationale?
>
> Because LLVM has no notion of aggregates as "values" that can be
> passed around as atomic units.  This is a very important design point,
> and has many useful values.
I see. You explained one of them in a message on the XL mailing list,  
which I think is worth repeating here:
> This doesn't fit naturally with the way that LLVM does things:  In
> LLVM, each instruction can produce at most one value.  This means that
> a pointer to the instruction is as good as a pointer to the value,
> which dramatically simplifies the IR and everything that consumes or
> produces it.
An additional constraint you did not mention is that all the values  
must be first-class. But what is "first class" actually depends on  
the hardware and ABI. An i64, for instance, is first class on 64-bit  
CPUs, but not on 32-bit CPUs. Is the following legal on a 32-bit target?

	declare i64 @foo(i128, i256)
>   The "getaggregatevalue" is a localized hack to work
> around this for the few cases that return multiple values.
As a matter of fact, what annoys me the most with the  
getaggregatevalue proposal is precisely that it does not seem too  
localized to me. What about:

    %Agg = call {int, float} %foo()
    %intpart = getaggregatevalue {int, float} %Agg, uint 0
    [insert 200 instructions here]
    %floatpart = getaggregatevalue {int, float} %Agg, uint 1

What about a downstream IR manipulation turning that into:

    %Agg = call {int, float} %foo()
    %intpart = getaggregatevalue {int, float} %Agg, uint 0
    br label somewhere
somewhere:
    %floatpart = getaggregatevalue {int, float} %Agg, uint 1

I am afraid that the hack would not remain localized for too long ;-)  
i.e. you probably will need to have stuff to keep the call and  
getaggregatevalue close together.

>>
> Unfortunately, this wouldn't solve the problem that you think it
> does.  For example, lets assume that LLVM allowed you to pass and
> return structs by value.  Even with this, LLVM would not be able to
> directly implement all ABIs "naturally".  For example, some ABIs
> specify that a _Complex double should be returned in two FP registers,
> but that a struct with two doubles in it should be returned in memory.
Even today, that must be special cased, i.e. the IR needs to be  
distinct between the two cases. As I understand it, the following is  
already legal, since vectors are first class:

	declare <2 x double> @builtin_complex_add (<2 x double>, <2 x
double>)

That would be the built-in complex type. The user-defined complex-in- 
struct type could be one of the following depending on the ABI:

	declare void @user_complex_add (double, double, double, double,  
{double, double} *)
	declare void @user_complex_add ({double, double} *, double, double,  
double, double)
	declare void @user_complex_add ({double, double} *, {double, double}  
*, {double, double} *)

My proposal would not invalidate any of these, but allow the  
following, which would immediately be expanded to the appropriate  
choice of the above depending on the target calling conventions:

	declare {double, double} @user_complex_add({double, double},  
{double, double})

It's possible that you want to allow some parameter attributes, i.e.  
be able to distinguish:

	declare sret {double, double} @user_complex_add({double, double},  
{double, double})
	declare inreg {double, double} @user_complex_add({double, double},  
{double, double})

> By the time you lower to LLVM, all you have is {double,double}.  In
> fact, there is no way, in general, to retain all the high level
> information in LLVM without flavoring the LLVM IR with target info
Agreed.

Anyway, for the moment, I will generate what LLVM accepts as input.


Thanks
Christophe

Christophe de Dinechin

2007-Nov-06 07:27 UTC

head link

[LLVMdev] Passing and returning aggregates (who is responsible for the ABI?)

On 6 nov. 07, at 01:35, Gordon Henriksen wrote:
>> But then, why refuse aggregates as input or output of a call? What
>> is the rationale?
>
> Probably in good part because, in LLVM, aggregates (or derived types)
> types exist only in memory, not in registers.
Thanks, that's precisely where I see a problem. On many recent  
architectures (Itanium being the extreme case), small enough  
aggregates are passed and held in registers. Thinking or designing  
"aggregates == memory" is an obsolete approach ;-) I like the
"call"
instruction because, at least, it got rid of the "arguments == push  
to stack" approach you find in the Java or MISL bytecodes...

As an aside, why do I care? I wanted XL to be efficient on modern  
architectures, so I got rid of "implicit memory accesses" as much as  
I could, e.g. no "this pointer". At one point, I compiled a simple  
program manipulating complex numbers to draw a Julia set. At the  
lowest level of optimization, the XL version was at least 70% faster  
than the C++ version.

Why? Because the user-defined complex operations in XL were all done  
in registers, whereas at that level of optimization, the C++ compiler  
was not doing the memory aliasing analysis required to perform  
"register field promotion", elimintate the "this pointer",
and turn
the C++ complex class into registers. In other words, a complex  
addition was 4 loads, two fp adds, and 2 stores for C++, as opposed  
to only the fp adds for XL. Obviously, an IR assuming that aggregates  
are in memory does not help here.

>>
> Chris has some notes about how to do this for return values here:
>
> http://www.nondot.org/sabre/LLVMNotes/MultipleReturnValues.txt
He pointed me to this earlier, thanks.

Thanks,
Christophe

Apparently Analagous Threads

Search for more maybe matching threads

llvm dev - Nov 2007 - [LLVMdev] Passing and returning aggregates (who is responsible for the ABI?)

[LLVMdev] Passing and returning aggregates (who is responsible for the ABI?)

[LLVMdev] Passing and returning aggregates (who is responsible for the ABI?)

[LLVMdev] Passing and returning aggregates (who is responsible for the ABI?)

[LLVMdev] Passing and returning aggregates (who is responsible for the ABI?)

[LLVMdev] Passing and returning aggregates (who is responsible for the ABI?)

Apparently Analagous Threads