Matthijs Kooijman
2008-May-30 16:11 UTC
[LLVMdev] Plans considering first class structs and multiple return values
Hi all,
I've been implementing some stuff that uses the new structs-as-firstclass
values code. Apart from some implementation problems, I'm spotting a few
structural problems that seem non-trivial to fix.
In particular, now that structs are a first class values, the old way or
returning multiple values is a bit confusing. The old way had a variable
number of arguments to the return instruction, which could come out at the
caller end as some special aggregrate value that could be indexed using
getresult.
Now that aggregrates are first class, you should be able to simply return a
single struct and access the result in the caller using the new extractvalue
instruction.
However, both approaches define a function with a struct as a return type:
It's not possible to tell the difference between both from looking at the
function type. So, any caller cannot know for sure what to do with the
result...
Also, there is still a lot of code that assumes that having a struct return
type means you're using a multiple return statement, preventing things like
define {i32, i32} @bar()
{
%res1 = insertvalue { i32, i32 } zeroinitializer, i32 1, i32 0
%res2 = insertvalue { i32, i32 } %res1, i32 2, i32 0
ret { i32, i32 } %res2
}
from working (the validator currently barfs over this).
The lack of this distinction also means it is not so trivial to "add"
an extra
argument to a function: If the return type is an aggregrate, should you add an
element to that aggregrate, or create a new struct containing the previous
struct and the new value? And what about functions that return void and want
an extra argument? Messy.
The main cause of this is actually the special case for returning a single
value. Instead of returning a struct with one element, you just return the
element. You could make this more consistent by making a function always
return a struct, which most of the time will just contain a single field.
I'm
not sure that this is really a usable approach (or what the ABI impact is),
but it could be useful. In particular, a function returning a struct would
then be declared as returning {{i32, i32}} (to distinguish it between a
function returning two values, which would be {i32, i32}). This is also
consistent with making a void function return {}. This kind of stuff could
make a lot of code a lot more regular, but might be a bit annoying to
implemented.
Furthermore, as far as I've understood, the intention is to remove the
"multiple return value" support in favour of returning structs. I take
it this
means that at least the getresult instruction will be removed, and possible
the multiple operand return as well. This would partly solve some issues, but
will completely remove the concept of returning multiple values (unless you
adopt the above approach of always returning structs, even for single values).
Additionally, the current form of the ret instruction is still useful, for
making multiple return values readable. In particular, writing
ret i32 1, i32 2
is a lot more readable than the (functionally identical) three line return
statement above. However, to make the first class aggregrates even more
usable, it might be better to remove the multi operand return instruction and
add support for literal aggregrates. Currently, I think these are only
supported as global constants. It would be useful if the following was valid:
ret { i32, i32 } { i32 1, i32 2 }
However, llvm-as rejects this. Interestingly, 'zeroinitializer' is a
valid
operand to ret, which should be similar to '{ i32 1, i32 2 }' in that
they are
both literal aggregates.
Even more, one would also like to be able to build non constant structs in a
similar manner. i.e., writing
ret { i32, i32 } { i32 %a, i32 %b }
would be a lot more useful than the current
ret i32 %a, i32 %b
form, since in the first form the ret instruction still has a single operand
that is easy to work with.
Perhaps if using a non-constant literal struct is not so trivial, a
instruction can be added for that instead. Ie,
%res = buildagg i32 %a, i32 %b
ret %res
This is still a clean way of building a struct (a lot easier to work with than
nested insertvalues IMHO) but also leaves the ret instruction simple.
Anyway, if using a literal struct as an operand would work, then the multiple
return value ret instruction can probably be removed alltogether.
I've attached a patch with some changes I made to get things working a bit,
but it's not really a decent patch yet. The changes to Andersens' pass
are
plain wrong, things can break horribly when you fiddle with structs containing
pointers this way. The others prevent things from asserting and I think they
are consistent with the current state of the code, but will need change
depending on what happens to the return instruction.
Any enlightening thoughs on this issue? I'm I've put down mine in a
slightly
chaotic manner, hopefully things will still get across clearly :-)
Gr.
Matthijs
-------------- next part --------------
A non-text attachment was scrubbed...
Name: aggregrates.diff
Type: text/x-diff
Size: 2384 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20080530/39dbc07f/attachment.diff>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20080530/39dbc07f/attachment.sig>
Dan Gohman
2008-May-30 17:40 UTC
[LLVMdev] Plans considering first class structs and multiple return values
On May 30, 2008, at 9:11 AM, Matthijs Kooijman wrote:> Hi all, > > I've been implementing some stuff that uses the new structs-as- > firstclass > values code. Apart from some implementation problems, I'm spotting a > few > structural problems that seem non-trivial to fix.Hi, thanks for your interest!> Furthermore, as far as I've understood, the intention is to remove the > "multiple return value" support in favour of returning structs. I > take it this > means that at least the getresult instruction will be removed, and > possible > the multiple operand return as well. This would partly solve some > issues, but > will completely remove the concept of returning multiple values > (unless you > adopt the above approach of always returning structs, even for > single values).Yes, the intention is that getresult will be removed once first-class aggregates are a ready replacement. This won't leave LLVM missing the concept of returning multiple values; a struct can be thought of as a container for multiple values.> Additionally, the current form of the ret instruction is still > useful, for > making multiple return values readable. In particular, writing > ret i32 1, i32 2 > is a lot more readable than the (functionally identical) three line > return > statement above. However, to make the first class aggregrates even > more > usable, it might be better to remove the multi operand return > instruction and > add support for literal aggregrates. Currently, I think these are only > supported as global constants. It would be useful if the following > was valid: > ret { i32, i32 } { i32 1, i32 2 }I think this form should be valid once first-class struct support is more complete. If it's not accepted today it may be a conflict with the current multiple-return-value syntax, or it may be a bug.> Even more, one would also like to be able to build non constant > structs in a > similar manner. i.e., writing > ret { i32, i32 } { i32 %a, i32 %b } > would be a lot more useful than the current > ret i32 %a, i32 %b > form, since in the first form the ret instruction still has a single > operand > that is easy to work with.The current design will have it looking like this: %t0 = insertvalue { i32, i32 } undef, i32 %a, 0 %t1 = insertvalue { i32, i32 } %t0, i32 %b, 1 ret { i32, i32 } %t1 once first-class structs take over from the current multiple-return- value support. It's significantly more syntax, but it's a significantly simpler IR schema.> > > Perhaps if using a non-constant literal struct is not so trivial, a > instruction can be added for that instead. Ie, > %res = buildagg i32 %a, i32 %b > ret %res > > This is still a clean way of building a struct (a lot easier to work > with than > nested insertvalues IMHO) but also leaves the ret instruction simple.This looks nice, and I wouldn't rule out doing it in the future, but it gets a little awkward in the case of nested aggregates. For the moment, I'd like to stick with just one way to build aggregates; insertvalue is sufficient to do everything needed, even if it isn't the prettiest.> Anyway, if using a literal struct as an operand would work, then the > multiple > return value ret instruction can probably be removed alltogether. > > I've attached a patch with some changes I made to get things working > a bit, > but it's not really a decent patch yet. The changes to Andersens' > pass are > plain wrong, things can break horribly when you fiddle with structs > containing > pointers this way. The others prevent things from asserting and I > think they > are consistent with the current state of the code, but will need > change > depending on what happens to the return instruction.Thanks for this patch and the several others! As I mentioned before, insertelement and extractelement are about to be significantly changed, for constant indices. Once that's in, I'll be happy to work with you to revise these patches accordingly and integrate them. Dan
Matthijs Kooijman
2008-Jun-02 15:45 UTC
[LLVMdev] Plans considering first class structs and multiple return values
Hi Dan,> Yes, the intention is that getresult will be removed once first-class > aggregates are a ready replacement. This won't leave LLVM missing the > concept of returning multiple values; a struct can be thought of as > a container for multiple values.I'm not saying we don't have some way of modeling multiple return values, I'm sayin the explicit concept disappears. We can use a struct return type to create a function that effectively returns multiple values, but that is still a function returning a single value: A struct. In particular, it will be impossible to distinguish between a function returning a single struct and a function returning multiple values. I'm not sure this is a big problem, but it makes adding a return value to a function harder. I'm not sure this is really a problem though. Would adding a function attribute returns_multiple or something like that be useful? returns_multiple would mean to interpret the returned struct as multiple return values and in particular forbids to use the resulting value in any way but as an operand to extractvalue. The main goal of this is to make adding/removing an argument easier, because you only need to modify the extractvalues. On the other hand, this limitation sounds a lot like the current getresult approach and might not be all to useful.> > Additionally, the current form of the ret instruction is still > > useful, for > > making multiple return values readable. In particular, writing > > ret i32 1, i32 2 > > is a lot more readable than the (functionally identical) three line > > return > > statement above. However, to make the first class aggregrates even > > more > > usable, it might be better to remove the multi operand return > > instruction and > > add support for literal aggregrates. Currently, I think these are only > > supported as global constants. It would be useful if the following > > was valid: > > ret { i32, i32 } { i32 1, i32 2 } > > I think this form should be valid once first-class struct support is more > complete. If it's not accepted today it may be a conflict with the current > multiple-return-value syntax, or it may be a bug.It doesn't seem to be accepted by llvm-as. I think you might be able to build this in memory, since a ConstantStruct is just a Value*.> > Even more, one would also like to be able to build non constant structs > > in a similar manner. i.e., writing > > ret { i32, i32 } { i32 %a, i32 %b } > > would be a lot more useful than the current > > ret i32 %a, i32 %b > > form, since in the first form the ret instruction still has a single > > operand that is easy to work with. > > The current design will have it looking like this: > > %t0 = insertvalue { i32, i32 } undef, i32 %a, 0 > %t1 = insertvalue { i32, i32 } %t0, i32 %b, 1 > ret { i32, i32 } %t1 > > once first-class structs take over from the current multiple-return- value > support. It's significantly more syntax, but it's a significantly simpler > IR schema.On the other hand, anyone looking to support multiple return values but other (potentially complicated) uses for first class structs would have a harder time trying to find out what these nested insertvalues actually do. The main difference here is that using insertvalue you can do something like: %a = phi { i32, i32 } [ %a.0, %foo ], [ %a.1, %bar ] %b = insertvalue { i32, i32 } %a, i32 0, 0 which you can't do directly using a literal { } or buildagg kind of instruction. OTOH, you can still do things like this using nested structs then, so having a builddag will probably not improve things much. Anyhow, so much for my blabbering of incoherent thoughts. I think that simply using insertvalue for now and not having an explicit multiple return function attribute should work fine. Whenever I want to add a function argument, I will just let it return a struct of two elements (current value and the new value). I'll also add a feature to the sretpromotion pass that flattens out nested struct return types as much as possible, without having to reconstruct structs at the caller end (ie, preserver struct types that are used in any way other than extractvalue in any caller and flatten out all other elements). Would this be the right place for that? Or should sretpromotion really only take care of promotion sret pointer arguments to multiple return values, and have second pass for flattening them out? A problem of that seems to be that it is hard for sretpromotion to simply add return values, since it doesn't know whether to add to the existing struct return type (if any) or create a new struct return type. I guess integrating these two passes makes the most sense, then. I guess the argumentpromotion pass also needs to be adapted to promote first class struct arguments (probably handle them identical to how byval pointer-to-struct are handled now), but I don't currently have need of that. Lastly, I'll modify IPConstProp and DeadArgElim to properly handle multiple return values and do constprop/removal on each of them individually, instead of only working when all of them are constant/dead as is done currently. I'm also thinking of adding a transformation that makes return values dead if they only return an unmodified argument, this can probably go into DeadArgElim. Gr. Matthijs -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: Digital signature URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20080602/963e1959/attachment.sig>
Chris Lattner
2008-Jun-07 22:01 UTC
[LLVMdev] Plans considering first class structs and multiple return values
On May 30, 2008, at 10:40 AM, Dan Gohman wrote:>> Furthermore, as far as I've understood, the intention is to remove >> the >> "multiple return value" support in favour of returning structs.Yep.>> I >> take it this >> means that at least the getresult instruction will be removed, and >> possible >> the multiple operand return as well. This would partly solve some >> issues, but >> will completely remove the concept of returning multiple values >> (unless you >> adopt the above approach of always returning structs, even for >> single values). > > Yes, the intention is that getresult will be removed once first-class > aggregates are a ready replacement. This won't leave LLVM missing the > concept of returning multiple values; a struct can be thought of as > a container for multiple values.Right.> > >> Additionally, the current form of the ret instruction is still >> useful, for >> making multiple return values readable. In particular, writing >> ret i32 1, i32 2 >> is a lot more readable than the (functionally identical) three line >> return >> statement above. However, to make the first class aggregrates even >> more >> usable, it might be better to remove the multi operand return >> instruction and >> add support for literal aggregrates. Currently, I think these are >> only >> supported as global constants. It would be useful if the following >> was valid: >> ret { i32, i32 } { i32 1, i32 2 } > > I think this form should be valid once first-class struct support is > more > complete. If it's not accepted today it may be a conflict with the > current > multiple-return-value syntax, or it may be a bug.Right. Returning a ConstantStruct should be legal, and not a real problem.>> Even more, one would also like to be able to build non constant >> structs in a >> similar manner. i.e., writing >> ret { i32, i32 } { i32 %a, i32 %b } >> would be a lot more useful than the current >> ret i32 %a, i32 %b >> form, since in the first form the ret instruction still has a single >> operand >> that is easy to work with. > > The current design will have it looking like this: > > %t0 = insertvalue { i32, i32 } undef, i32 %a, 0 > %t1 = insertvalue { i32, i32 } %t0, i32 %b, 1 > ret { i32, i32 } %t1 > > once first-class structs take over from the current multiple-return- > value > support. It's significantly more syntax, but it's a significantly > simpler > IR schema.Agreed. Thanks to both of you for working on this! -Chris
Maybe Matching Threads
- [LLVMdev] Plans considering first class structs and multiple return values
- [LLVMdev] Plans considering first class structs and multiple return values
- [LLVMdev] Plans considering first class structs and multiple return values
- [LLVMdev] Plans considering first class structs and multiple return values
- [LLVMdev] Plans considering first class structs and multiple return values