Christophe de Dinechin
2007-Nov-06 00:19 UTC
[LLVMdev] Passing and returning aggregates (who is responsible for the ABI?)
Hello, I'm trying to port the XL compiler (http://xlr.sf.net) to use the LLVM back-end. So far, little trouble doing so. But there is one aspect of the semantics of the LLVM IR that surprises me. Why are the call, declare and define "halfway through" ABI conventions? I think it's the right thing to have a single high level node for each call, as opposed to separate instructions for pushing individual argument, for example. But that implies that the call semantics include a good dose of ABI and calling conventions. This is explicit in the fact that you tell what the calling conventions are (e.g ccc, fastcc). But then, why refuse aggregates as input or output of a call? What is the rationale? On x86, I think it does not make any difference. But for Itanium, it's clearly broken (e.g. Itanium can return a struct of up to 4 ints in registers, and packs input parameters in a "funny" way). Languages such as Ada or XL have output parameters, and they are similarly difficult to generate code for (you have to make it look like C). I don't think adding aggregate support would break any current IR producer, and assuming the aggregates are expanded early on, it probably has very localized impact in the code. Are there other good reasons not to add this capability, or would a patch adding it stand a good chance to be accepted? Thanks Christophe
Gordon Henriksen
2007-Nov-06 00:35 UTC
[LLVMdev] Passing and returning aggregates (who is responsible for the ABI?)
On Nov 5, 2007, at 19:19, Christophe de Dinechin wrote:> I'm trying to port the XL compiler (http://xlr.sf.net) to use the > LLVM back-end. So far, little trouble doing so. But there is one > aspect of the semantics of the LLVM IR that surprises me. Why are > the call, declare and define "halfway through" ABI conventions? > > I think it's the right thing to have a single high level node for > each call, as opposed to separate instructions for pushing > individual argument, for example. But that implies that the call > semantics include a good dose of ABI and calling conventions. This > is explicit in the fact that you tell what the calling conventions > are (e.g ccc, fastcc). > > But then, why refuse aggregates as input or output of a call? What > is the rationale?Probably in good part because, in LLVM, aggregates (or derived types) types exist only in memory, not in registers.> On x86, I think it does not make any difference. But for Itanium, > it's clearly broken (e.g. Itanium can return a struct of up to 4 > ints in registers, and packs input parameters in a "funny" way). > Languages such as Ada or XL have output parameters, and they are > similarly difficult to generate code for (you have to make it look > like C). > > I don't think adding aggregate support would break any current IR > producer, and assuming the aggregates are expanded early on, it > probably has very localized impact in the code. Are there other good > reasons not to add this capability, or would a patch adding it stand > a good chance to be accepted?Chris has some notes about how to do this for return values here: http://www.nondot.org/sabre/LLVMNotes/MultipleReturnValues.txt — Gordon
Chris Lattner
2007-Nov-06 05:17 UTC
[LLVMdev] Passing and returning aggregates (who is responsible for the ABI?)
> I'm trying to port the XL compiler (http://xlr.sf.net) to use the > LLVM back-end. So far, little trouble doing so. But there is one > aspect of the semantics of the LLVM IR that surprises me. Why are the > call, declare and define "halfway through" ABI conventions?Hrm?> I think it's the right thing to have a single high level node for > each call, as opposed to separate instructions for pushing individual > argument, for example. But that implies that the call semantics > include a good dose of ABI and calling conventions. This is explicit > in the fact that you tell what the calling conventions are (e.g ccc, > fastcc).Right.> But then, why refuse aggregates as input or output of a call? What is > the rationale?Because LLVM has no notion of aggregates as "values" that can be passed around as atomic units. This is a very important design point, and has many useful values.> On x86, I think it does not make any difference. But > for Itanium, it's clearly broken (e.g. Itanium can return a struct of > up to 4 ints in registers, and packs input parameters in a "funny" > way). Languages such as Ada or XL have output parameters, and they > are similarly difficult to generate code for (you have to make it > look like C). > > I don't think adding aggregate support would break any current IR > producer, and assuming the aggregates are expanded early on, it > probably has very localized impact in the code. Are there other good > reasons not to add this capability, or would a patch adding it stand > a good chance to be accepted?Unfortunately, this wouldn't solve the problem that you think it does. For example, lets assume that LLVM allowed you to pass and return structs by value. Even with this, LLVM would not be able to directly implement all ABIs "naturally". For example, some ABIs specify that a _Complex double should be returned in two FP registers, but that a struct with two doubles in it should be returned in memory. By the time you lower to LLVM, all you have is {double,double}. In fact, there is no way, in general, to retain all the high level information in LLVM without flavoring the LLVM IR with target info. -Chris
Christophe de Dinechin
2007-Nov-06 07:07 UTC
[LLVMdev] Passing and returning aggregates (who is responsible for the ABI?)
On 6 nov. 07, at 06:17, Chris Lattner wrote:>> But then, why refuse aggregates as input or output of a call? What is >> the rationale? > > Because LLVM has no notion of aggregates as "values" that can be > passed around as atomic units. This is a very important design point, > and has many useful values.I see. You explained one of them in a message on the XL mailing list, which I think is worth repeating here:> This doesn't fit naturally with the way that LLVM does things: In > LLVM, each instruction can produce at most one value. This means that > a pointer to the instruction is as good as a pointer to the value, > which dramatically simplifies the IR and everything that consumes or > produces it.An additional constraint you did not mention is that all the values must be first-class. But what is "first class" actually depends on the hardware and ABI. An i64, for instance, is first class on 64-bit CPUs, but not on 32-bit CPUs. Is the following legal on a 32-bit target? declare i64 @foo(i128, i256)> The "getaggregatevalue" is a localized hack to work > around this for the few cases that return multiple values.As a matter of fact, what annoys me the most with the getaggregatevalue proposal is precisely that it does not seem too localized to me. What about: %Agg = call {int, float} %foo() %intpart = getaggregatevalue {int, float} %Agg, uint 0 [insert 200 instructions here] %floatpart = getaggregatevalue {int, float} %Agg, uint 1 What about a downstream IR manipulation turning that into: %Agg = call {int, float} %foo() %intpart = getaggregatevalue {int, float} %Agg, uint 0 br label somewhere somewhere: %floatpart = getaggregatevalue {int, float} %Agg, uint 1 I am afraid that the hack would not remain localized for too long ;-) i.e. you probably will need to have stuff to keep the call and getaggregatevalue close together.>> > Unfortunately, this wouldn't solve the problem that you think it > does. For example, lets assume that LLVM allowed you to pass and > return structs by value. Even with this, LLVM would not be able to > directly implement all ABIs "naturally". For example, some ABIs > specify that a _Complex double should be returned in two FP registers, > but that a struct with two doubles in it should be returned in memory.Even today, that must be special cased, i.e. the IR needs to be distinct between the two cases. As I understand it, the following is already legal, since vectors are first class: declare <2 x double> @builtin_complex_add (<2 x double>, <2 x double>) That would be the built-in complex type. The user-defined complex-in- struct type could be one of the following depending on the ABI: declare void @user_complex_add (double, double, double, double, {double, double} *) declare void @user_complex_add ({double, double} *, double, double, double, double) declare void @user_complex_add ({double, double} *, {double, double} *, {double, double} *) My proposal would not invalidate any of these, but allow the following, which would immediately be expanded to the appropriate choice of the above depending on the target calling conventions: declare {double, double} @user_complex_add({double, double}, {double, double}) It's possible that you want to allow some parameter attributes, i.e. be able to distinguish: declare sret {double, double} @user_complex_add({double, double}, {double, double}) declare inreg {double, double} @user_complex_add({double, double}, {double, double})> By the time you lower to LLVM, all you have is {double,double}. In > fact, there is no way, in general, to retain all the high level > information in LLVM without flavoring the LLVM IR with target infoAgreed. Anyway, for the moment, I will generate what LLVM accepts as input. Thanks Christophe
Christophe de Dinechin
2007-Nov-06 07:27 UTC
[LLVMdev] Passing and returning aggregates (who is responsible for the ABI?)
On 6 nov. 07, at 01:35, Gordon Henriksen wrote:>> But then, why refuse aggregates as input or output of a call? What >> is the rationale? > > Probably in good part because, in LLVM, aggregates (or derived types) > types exist only in memory, not in registers.Thanks, that's precisely where I see a problem. On many recent architectures (Itanium being the extreme case), small enough aggregates are passed and held in registers. Thinking or designing "aggregates == memory" is an obsolete approach ;-) I like the "call" instruction because, at least, it got rid of the "arguments == push to stack" approach you find in the Java or MISL bytecodes... As an aside, why do I care? I wanted XL to be efficient on modern architectures, so I got rid of "implicit memory accesses" as much as I could, e.g. no "this pointer". At one point, I compiled a simple program manipulating complex numbers to draw a Julia set. At the lowest level of optimization, the XL version was at least 70% faster than the C++ version. Why? Because the user-defined complex operations in XL were all done in registers, whereas at that level of optimization, the C++ compiler was not doing the memory aliasing analysis required to perform "register field promotion", elimintate the "this pointer", and turn the C++ complex class into registers. In other words, a complex addition was 4 loads, two fp adds, and 2 stores for C++, as opposed to only the fp adds for XL. Obviously, an IR assuming that aggregates are in memory does not help here.>> > Chris has some notes about how to do this for return values here: > > http://www.nondot.org/sabre/LLVMNotes/MultipleReturnValues.txtHe pointed me to this earlier, thanks. Thanks, Christophe
Possibly Parallel Threads
- [LLVMdev] Passing and returning aggregates (who is responsible for the ABI?)
- [LLVMdev] Passing and returning aggregates (who is responsible for the ABI?)
- [LLVMdev] Passing and returning aggregates (who is responsible for the ABI?)
- [LLVMdev] Passing and returning aggregates (who is responsible for the ABI?)
- How to deal with a dataframe within a dataframe?