Matthijs Kooijman
2008-May-30 16:11 UTC
[LLVMdev] Plans considering first class structs and multiple return values
Hi all, I've been implementing some stuff that uses the new structs-as-firstclass values code. Apart from some implementation problems, I'm spotting a few structural problems that seem non-trivial to fix. In particular, now that structs are a first class values, the old way or returning multiple values is a bit confusing. The old way had a variable number of arguments to the return instruction, which could come out at the caller end as some special aggregrate value that could be indexed using getresult. Now that aggregrates are first class, you should be able to simply return a single struct and access the result in the caller using the new extractvalue instruction. However, both approaches define a function with a struct as a return type: It's not possible to tell the difference between both from looking at the function type. So, any caller cannot know for sure what to do with the result... Also, there is still a lot of code that assumes that having a struct return type means you're using a multiple return statement, preventing things like define {i32, i32} @bar() { %res1 = insertvalue { i32, i32 } zeroinitializer, i32 1, i32 0 %res2 = insertvalue { i32, i32 } %res1, i32 2, i32 0 ret { i32, i32 } %res2 } from working (the validator currently barfs over this). The lack of this distinction also means it is not so trivial to "add" an extra argument to a function: If the return type is an aggregrate, should you add an element to that aggregrate, or create a new struct containing the previous struct and the new value? And what about functions that return void and want an extra argument? Messy. The main cause of this is actually the special case for returning a single value. Instead of returning a struct with one element, you just return the element. You could make this more consistent by making a function always return a struct, which most of the time will just contain a single field. I'm not sure that this is really a usable approach (or what the ABI impact is), but it could be useful. In particular, a function returning a struct would then be declared as returning {{i32, i32}} (to distinguish it between a function returning two values, which would be {i32, i32}). This is also consistent with making a void function return {}. This kind of stuff could make a lot of code a lot more regular, but might be a bit annoying to implemented. Furthermore, as far as I've understood, the intention is to remove the "multiple return value" support in favour of returning structs. I take it this means that at least the getresult instruction will be removed, and possible the multiple operand return as well. This would partly solve some issues, but will completely remove the concept of returning multiple values (unless you adopt the above approach of always returning structs, even for single values). Additionally, the current form of the ret instruction is still useful, for making multiple return values readable. In particular, writing ret i32 1, i32 2 is a lot more readable than the (functionally identical) three line return statement above. However, to make the first class aggregrates even more usable, it might be better to remove the multi operand return instruction and add support for literal aggregrates. Currently, I think these are only supported as global constants. It would be useful if the following was valid: ret { i32, i32 } { i32 1, i32 2 } However, llvm-as rejects this. Interestingly, 'zeroinitializer' is a valid operand to ret, which should be similar to '{ i32 1, i32 2 }' in that they are both literal aggregates. Even more, one would also like to be able to build non constant structs in a similar manner. i.e., writing ret { i32, i32 } { i32 %a, i32 %b } would be a lot more useful than the current ret i32 %a, i32 %b form, since in the first form the ret instruction still has a single operand that is easy to work with. Perhaps if using a non-constant literal struct is not so trivial, a instruction can be added for that instead. Ie, %res = buildagg i32 %a, i32 %b ret %res This is still a clean way of building a struct (a lot easier to work with than nested insertvalues IMHO) but also leaves the ret instruction simple. Anyway, if using a literal struct as an operand would work, then the multiple return value ret instruction can probably be removed alltogether. I've attached a patch with some changes I made to get things working a bit, but it's not really a decent patch yet. The changes to Andersens' pass are plain wrong, things can break horribly when you fiddle with structs containing pointers this way. The others prevent things from asserting and I think they are consistent with the current state of the code, but will need change depending on what happens to the return instruction. Any enlightening thoughs on this issue? I'm I've put down mine in a slightly chaotic manner, hopefully things will still get across clearly :-) Gr. Matthijs -------------- next part -------------- A non-text attachment was scrubbed... Name: aggregrates.diff Type: text/x-diff Size: 2384 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20080530/39dbc07f/attachment.diff> -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: Digital signature URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20080530/39dbc07f/attachment.sig>
Dan Gohman
2008-May-30 17:40 UTC
[LLVMdev] Plans considering first class structs and multiple return values
On May 30, 2008, at 9:11 AM, Matthijs Kooijman wrote:> Hi all, > > I've been implementing some stuff that uses the new structs-as- > firstclass > values code. Apart from some implementation problems, I'm spotting a > few > structural problems that seem non-trivial to fix.Hi, thanks for your interest!> Furthermore, as far as I've understood, the intention is to remove the > "multiple return value" support in favour of returning structs. I > take it this > means that at least the getresult instruction will be removed, and > possible > the multiple operand return as well. This would partly solve some > issues, but > will completely remove the concept of returning multiple values > (unless you > adopt the above approach of always returning structs, even for > single values).Yes, the intention is that getresult will be removed once first-class aggregates are a ready replacement. This won't leave LLVM missing the concept of returning multiple values; a struct can be thought of as a container for multiple values.> Additionally, the current form of the ret instruction is still > useful, for > making multiple return values readable. In particular, writing > ret i32 1, i32 2 > is a lot more readable than the (functionally identical) three line > return > statement above. However, to make the first class aggregrates even > more > usable, it might be better to remove the multi operand return > instruction and > add support for literal aggregrates. Currently, I think these are only > supported as global constants. It would be useful if the following > was valid: > ret { i32, i32 } { i32 1, i32 2 }I think this form should be valid once first-class struct support is more complete. If it's not accepted today it may be a conflict with the current multiple-return-value syntax, or it may be a bug.> Even more, one would also like to be able to build non constant > structs in a > similar manner. i.e., writing > ret { i32, i32 } { i32 %a, i32 %b } > would be a lot more useful than the current > ret i32 %a, i32 %b > form, since in the first form the ret instruction still has a single > operand > that is easy to work with.The current design will have it looking like this: %t0 = insertvalue { i32, i32 } undef, i32 %a, 0 %t1 = insertvalue { i32, i32 } %t0, i32 %b, 1 ret { i32, i32 } %t1 once first-class structs take over from the current multiple-return- value support. It's significantly more syntax, but it's a significantly simpler IR schema.> > > Perhaps if using a non-constant literal struct is not so trivial, a > instruction can be added for that instead. Ie, > %res = buildagg i32 %a, i32 %b > ret %res > > This is still a clean way of building a struct (a lot easier to work > with than > nested insertvalues IMHO) but also leaves the ret instruction simple.This looks nice, and I wouldn't rule out doing it in the future, but it gets a little awkward in the case of nested aggregates. For the moment, I'd like to stick with just one way to build aggregates; insertvalue is sufficient to do everything needed, even if it isn't the prettiest.> Anyway, if using a literal struct as an operand would work, then the > multiple > return value ret instruction can probably be removed alltogether. > > I've attached a patch with some changes I made to get things working > a bit, > but it's not really a decent patch yet. The changes to Andersens' > pass are > plain wrong, things can break horribly when you fiddle with structs > containing > pointers this way. The others prevent things from asserting and I > think they > are consistent with the current state of the code, but will need > change > depending on what happens to the return instruction.Thanks for this patch and the several others! As I mentioned before, insertelement and extractelement are about to be significantly changed, for constant indices. Once that's in, I'll be happy to work with you to revise these patches accordingly and integrate them. Dan
Matthijs Kooijman
2008-Jun-02 15:45 UTC
[LLVMdev] Plans considering first class structs and multiple return values
Hi Dan,> Yes, the intention is that getresult will be removed once first-class > aggregates are a ready replacement. This won't leave LLVM missing the > concept of returning multiple values; a struct can be thought of as > a container for multiple values.I'm not saying we don't have some way of modeling multiple return values, I'm sayin the explicit concept disappears. We can use a struct return type to create a function that effectively returns multiple values, but that is still a function returning a single value: A struct. In particular, it will be impossible to distinguish between a function returning a single struct and a function returning multiple values. I'm not sure this is a big problem, but it makes adding a return value to a function harder. I'm not sure this is really a problem though. Would adding a function attribute returns_multiple or something like that be useful? returns_multiple would mean to interpret the returned struct as multiple return values and in particular forbids to use the resulting value in any way but as an operand to extractvalue. The main goal of this is to make adding/removing an argument easier, because you only need to modify the extractvalues. On the other hand, this limitation sounds a lot like the current getresult approach and might not be all to useful.> > Additionally, the current form of the ret instruction is still > > useful, for > > making multiple return values readable. In particular, writing > > ret i32 1, i32 2 > > is a lot more readable than the (functionally identical) three line > > return > > statement above. However, to make the first class aggregrates even > > more > > usable, it might be better to remove the multi operand return > > instruction and > > add support for literal aggregrates. Currently, I think these are only > > supported as global constants. It would be useful if the following > > was valid: > > ret { i32, i32 } { i32 1, i32 2 } > > I think this form should be valid once first-class struct support is more > complete. If it's not accepted today it may be a conflict with the current > multiple-return-value syntax, or it may be a bug.It doesn't seem to be accepted by llvm-as. I think you might be able to build this in memory, since a ConstantStruct is just a Value*.> > Even more, one would also like to be able to build non constant structs > > in a similar manner. i.e., writing > > ret { i32, i32 } { i32 %a, i32 %b } > > would be a lot more useful than the current > > ret i32 %a, i32 %b > > form, since in the first form the ret instruction still has a single > > operand that is easy to work with. > > The current design will have it looking like this: > > %t0 = insertvalue { i32, i32 } undef, i32 %a, 0 > %t1 = insertvalue { i32, i32 } %t0, i32 %b, 1 > ret { i32, i32 } %t1 > > once first-class structs take over from the current multiple-return- value > support. It's significantly more syntax, but it's a significantly simpler > IR schema.On the other hand, anyone looking to support multiple return values but other (potentially complicated) uses for first class structs would have a harder time trying to find out what these nested insertvalues actually do. The main difference here is that using insertvalue you can do something like: %a = phi { i32, i32 } [ %a.0, %foo ], [ %a.1, %bar ] %b = insertvalue { i32, i32 } %a, i32 0, 0 which you can't do directly using a literal { } or buildagg kind of instruction. OTOH, you can still do things like this using nested structs then, so having a builddag will probably not improve things much. Anyhow, so much for my blabbering of incoherent thoughts. I think that simply using insertvalue for now and not having an explicit multiple return function attribute should work fine. Whenever I want to add a function argument, I will just let it return a struct of two elements (current value and the new value). I'll also add a feature to the sretpromotion pass that flattens out nested struct return types as much as possible, without having to reconstruct structs at the caller end (ie, preserver struct types that are used in any way other than extractvalue in any caller and flatten out all other elements). Would this be the right place for that? Or should sretpromotion really only take care of promotion sret pointer arguments to multiple return values, and have second pass for flattening them out? A problem of that seems to be that it is hard for sretpromotion to simply add return values, since it doesn't know whether to add to the existing struct return type (if any) or create a new struct return type. I guess integrating these two passes makes the most sense, then. I guess the argumentpromotion pass also needs to be adapted to promote first class struct arguments (probably handle them identical to how byval pointer-to-struct are handled now), but I don't currently have need of that. Lastly, I'll modify IPConstProp and DeadArgElim to properly handle multiple return values and do constprop/removal on each of them individually, instead of only working when all of them are constant/dead as is done currently. I'm also thinking of adding a transformation that makes return values dead if they only return an unmodified argument, this can probably go into DeadArgElim. Gr. Matthijs -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: Digital signature URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20080602/963e1959/attachment.sig>
Chris Lattner
2008-Jun-07 22:01 UTC
[LLVMdev] Plans considering first class structs and multiple return values
On May 30, 2008, at 10:40 AM, Dan Gohman wrote:>> Furthermore, as far as I've understood, the intention is to remove >> the >> "multiple return value" support in favour of returning structs.Yep.>> I >> take it this >> means that at least the getresult instruction will be removed, and >> possible >> the multiple operand return as well. This would partly solve some >> issues, but >> will completely remove the concept of returning multiple values >> (unless you >> adopt the above approach of always returning structs, even for >> single values). > > Yes, the intention is that getresult will be removed once first-class > aggregates are a ready replacement. This won't leave LLVM missing the > concept of returning multiple values; a struct can be thought of as > a container for multiple values.Right.> > >> Additionally, the current form of the ret instruction is still >> useful, for >> making multiple return values readable. In particular, writing >> ret i32 1, i32 2 >> is a lot more readable than the (functionally identical) three line >> return >> statement above. However, to make the first class aggregrates even >> more >> usable, it might be better to remove the multi operand return >> instruction and >> add support for literal aggregrates. Currently, I think these are >> only >> supported as global constants. It would be useful if the following >> was valid: >> ret { i32, i32 } { i32 1, i32 2 } > > I think this form should be valid once first-class struct support is > more > complete. If it's not accepted today it may be a conflict with the > current > multiple-return-value syntax, or it may be a bug.Right. Returning a ConstantStruct should be legal, and not a real problem.>> Even more, one would also like to be able to build non constant >> structs in a >> similar manner. i.e., writing >> ret { i32, i32 } { i32 %a, i32 %b } >> would be a lot more useful than the current >> ret i32 %a, i32 %b >> form, since in the first form the ret instruction still has a single >> operand >> that is easy to work with. > > The current design will have it looking like this: > > %t0 = insertvalue { i32, i32 } undef, i32 %a, 0 > %t1 = insertvalue { i32, i32 } %t0, i32 %b, 1 > ret { i32, i32 } %t1 > > once first-class structs take over from the current multiple-return- > value > support. It's significantly more syntax, but it's a significantly > simpler > IR schema.Agreed. Thanks to both of you for working on this! -Chris
Maybe Matching Threads
- [LLVMdev] Plans considering first class structs and multiple return values
- [LLVMdev] Plans considering first class structs and multiple return values
- [LLVMdev] Plans considering first class structs and multiple return values
- [LLVMdev] Plans considering first class structs and multiple return values
- [LLVMdev] Plans considering first class structs and multiple return values