Chen Li via llvm-dev
2015-Dec-02 17:47 UTC
[llvm-dev] Support token type in struct for landingpad
> On Dec 1, 2015, at 11:14 PM, David Majnemer <david.majnemer at gmail.com> wrote: > > While we support 'opaque' types nested within struct types, they are not exactly battle tested: > > $ cat t.ll > %opaque_ty = type opaque > > %struct_ty = type { i32, %opaque_ty } > > define %struct_ty @f(%struct_ty* %p) { > %load = load %struct_ty, %struct_ty* %p > ret %struct_ty %load > } > > $ opt -O2 t.ll -S > lib/IR/DataLayout.cpp:623: unsigned int llvm::DataLayout::getAlignment(llvm::Type *, bool) const: Assertion `Ty->isSized() && "Cannot getTypeInfo() on a type that is unsized!"' failed.Thanks David! I’ve actually hacked to add token type into struct type and ended up with the same failure as above. I will take a look at the catchpad and cleanuppad code, and create a patch to add token landingpad and have you review it. thanks, chen> > As a practical matter, I fear nesting 'token' types within struct types will have similar issues. Beyond that, the design philosophy behind 'token' is that it is incredibly opaque and permitting it to nest inside a struct creates scenarios where one might try to GEP to the end of the field right before the token field in an attempt to examine or manipulate the token. > > Your other recommendation, having landingpad produce a token, is quite similar to how we've designed catchpad and cleanuppad. I think that direction will be quite nice. > > On Tue, Dec 1, 2015 at 8:07 PM, Chen Li <meloli87 at gmail.com <mailto:meloli87 at gmail.com>> wrote: > Hi David, > > Sorry to bother you, but I would like to get some suggestions on your recent work of token type. > > I’m currently working on changing gc.statepoint to return a token type instead of a i32 type. The reason is that with the current implementation, gc.statepoint could potentially be fed into PHI nodes, and break RewriteStatepointsForGC pass later. Using token type would help us to avoid this. I have most of the code work but got a problem when gc.statepint is an InvokeInst and has an unwind path. > > Currently, gc.statepoint of InvokeInst works as below (the code snippet is from test/CodeGen/X86/statepoint-invoke.ll): > > %0 = invoke i32 (i64, i32, void (i64 addrspace(1)*)*, i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidp1i64f(i64 0, i32 0, void (i64 addrspace(1)*)* @some_call, i32 1, i32 0, i64 addrspace(1)* %obj, i32 0, i32 5, i32 0, i32 -1, i32 0, i32 0, i32 0, i64 addrspace(1)* %obj, i64 addrspace(1)* %obj1) > to label %invoke_safepoint_normal_dest unwind label %exceptional_return > > invoke_safepoint_normal_dest: > … > > exceptional_return: > %landing_pad = landingpad { i8*, i32 } > cleanup > %relocate_token = extractvalue { i8*, i32 } %landing_pad, 1 > %obj.relocated1 = call coldcc i64 addrspace(1)* @llvm.experimental.gc.relocate.p1i64(i32 % relocate_token, i32 13, i32 13) > %obj1.relocated1 = call coldcc i64 addrspace(1)* @llvm.experimental.gc.relocate.p1i64(i32 % relocate_token, i32 14, i32 14) > ret i64 addrspace(1)* %obj1.relocated1 > > > Each gc.relocate needs to take its corresponding gc.statepoint as its first argument. However, on the unwind path, there is no way to get gc.statepoint directly because the return value of the InvokeInst is undefined there. In this scenario, we tie gc.relocate to the landingpad, and use the landingpad to find its unique predecessor to get the corresponding gc.statepoint. We pick the selector value from the landingpad to feed into gc.relocate just because it has the same type (i32) as gc.statepoint's return type. The actual value of the selector doesn’t really matter because gc.relocate only uses it as a reference to find gc.statepoint and not consume it during lowering. > > However, this will no longer work if we change gc.statepoint's return type to token type. To make it work, I could see two potential approaches. 1) add support of token type inside struct type so that we can have a landingpad with result type of { i8*, token }, or 2) add support of landingpad with a token result type. Approach 1 seems to be easier since all the other parts of statepoint handling does not need to be changed at all, and having a selector of token type also seems reasonable (furthermore, we don’t ever need to extract selector value to do exception handling in our code base so I think only supporting token type in struct should be enough for us). Approach 2 requires to modify the way how gc.relocate looks up for its corresponding gc.statepoint through landingpad, but shouldn’t be hard either. > > Does either of the approaches sound reasonable to you? Other ideas are also welcomed :) > > Thank you very much! > > Best, > Chen >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20151202/2e564742/attachment.html>
Chen Li via llvm-dev
2015-Dec-03 21:05 UTC
[llvm-dev] Support token type in struct for landingpad
Hi David and Joseph, I’ve just added landingpad with token type locally and changed gc.relocate to work in the following way: %0 = invoke token (i64, i32, void (i64 addrspace(1)*)*, i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidp1i64f(i64 0, i32 0, void (i64 addrspace(1)*)* @some_call, i32 1, i32 0, i64 addrspace(1)* %obj, i32 0, i32 5, i32 0, i32 -1, i32 0, i32 0, i32 0, i64 addrspace(1)* %obj, i64 addrspace(1)* %obj1) to label %invoke_safepoint_normal_dest unwind label %exceptional_return invoke_safepoint_normal_dest: ... exceptional_return: %landing_pad = landingpad token cleanup %obj.relocated1 = call coldcc i64 addrspace(1)* @llvm.experimental.gc.relocate.p1i64(token %landing_pad, i32 13, i32 13) %obj1.relocated1 = call coldcc i64 addrspace(1)* @llvm.experimental.gc.relocate.p1i64(token %landing_pad, i32 14, i32 14) ret i64 addrspace(1)* %obj1.relocated1 Now gc.statepoint return a token type instead of i32 type, and gc.relocate also takes a token type as its first argument (the first argument should either be the corresponding gc.statepoint for call statepoint or invoke statepoint on the normal path, or a reference that could help find the corresponding gc.statepoint on the unwind the path). And since landingpad produces a token type here as well, it can be passed as the reference to the gc.relocate’s first argument. To make this work, I have changed two parts of the code. First is how gc.relocate looks up for its corresponding gc.statepoint on the unwind path. It used to use the extracted selector value to find the landingpad and then use the landingpad to find the invoke instruction, which is the gc.statepoint. Now, it can use the landingpad directly to find the invoke instruction. The second part is to make landingpad work with token type. In LLVM’s front end (passes before SelectionDAG), there is no restrictions on what type a landingpad should have (there are test cases in LLVM that has landingpad of i8 or i32 type). However, in SelectionDAGBuilder::visitLandingPad, it is enforced that landingpad must be two-valued (type of { i8*, i32 }), in which way it can handle the exception pointer and selector value inside it. As the first step, I’d like to just add a check to see if the landingpad is of token type, and if so stop it and don’t bother to create the DAG nodes for the exception pointer and selector value (same as what happens during SjLj exceptions). This is enough to support the gc.statepoint work but will not support for C++ style exception handling with gc.statepoint. As for follow-up work, I’d like to add some support to extract selector value from token landingpad. I think we could either do it explicitly in IR (maybe add a intrinsic call extract.selector or something similar) or implicitly during SelectionDAG (in visitLandingPad, check if it’s token type, and if so add an additional transform to extract the exception pointer and selector value from the token). I dont have a concrete design right now and I am happy to take any other ideas. My plan is to get the first step checked in and incrementally work on the follow-up work. Does that sound a reasonable approach to you guys? thanks, chen> On Dec 2, 2015, at 9:47 AM, Chen Li <meloli87 at gmail.com> wrote: > >> >> On Dec 1, 2015, at 11:14 PM, David Majnemer <david.majnemer at gmail.com <mailto:david.majnemer at gmail.com>> wrote: >> >> While we support 'opaque' types nested within struct types, they are not exactly battle tested: >> >> $ cat t.ll >> %opaque_ty = type opaque >> >> %struct_ty = type { i32, %opaque_ty } >> >> define %struct_ty @f(%struct_ty* %p) { >> %load = load %struct_ty, %struct_ty* %p >> ret %struct_ty %load >> } >> >> $ opt -O2 t.ll -S >> lib/IR/DataLayout.cpp:623: unsigned int llvm::DataLayout::getAlignment(llvm::Type *, bool) const: Assertion `Ty->isSized() && "Cannot getTypeInfo() on a type that is unsized!"' failed. > > Thanks David! I’ve actually hacked to add token type into struct type and ended up with the same failure as above. I will take a look at the catchpad and cleanuppad code, and create a patch to add token landingpad and have you review it. > > thanks, > chen > >> >> As a practical matter, I fear nesting 'token' types within struct types will have similar issues. Beyond that, the design philosophy behind 'token' is that it is incredibly opaque and permitting it to nest inside a struct creates scenarios where one might try to GEP to the end of the field right before the token field in an attempt to examine or manipulate the token. >> >> Your other recommendation, having landingpad produce a token, is quite similar to how we've designed catchpad and cleanuppad. I think that direction will be quite nice. >> >> On Tue, Dec 1, 2015 at 8:07 PM, Chen Li <meloli87 at gmail.com <mailto:meloli87 at gmail.com>> wrote: >> Hi David, >> >> Sorry to bother you, but I would like to get some suggestions on your recent work of token type. >> >> I’m currently working on changing gc.statepoint to return a token type instead of a i32 type. The reason is that with the current implementation, gc.statepoint could potentially be fed into PHI nodes, and break RewriteStatepointsForGC pass later. Using token type would help us to avoid this. I have most of the code work but got a problem when gc.statepint is an InvokeInst and has an unwind path. >> >> Currently, gc.statepoint of InvokeInst works as below (the code snippet is from test/CodeGen/X86/statepoint-invoke.ll): >> >> %0 = invoke i32 (i64, i32, void (i64 addrspace(1)*)*, i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidp1i64f(i64 0, i32 0, void (i64 addrspace(1)*)* @some_call, i32 1, i32 0, i64 addrspace(1)* %obj, i32 0, i32 5, i32 0, i32 -1, i32 0, i32 0, i32 0, i64 addrspace(1)* %obj, i64 addrspace(1)* %obj1) >> to label %invoke_safepoint_normal_dest unwind label %exceptional_return >> >> invoke_safepoint_normal_dest: >> … >> >> exceptional_return: >> %landing_pad = landingpad { i8*, i32 } >> cleanup >> %relocate_token = extractvalue { i8*, i32 } %landing_pad, 1 >> %obj.relocated1 = call coldcc i64 addrspace(1)* @llvm.experimental.gc.relocate.p1i64(i32 % relocate_token, i32 13, i32 13) >> %obj1.relocated1 = call coldcc i64 addrspace(1)* @llvm.experimental.gc.relocate.p1i64(i32 % relocate_token, i32 14, i32 14) >> ret i64 addrspace(1)* %obj1.relocated1 >> >> >> Each gc.relocate needs to take its corresponding gc.statepoint as its first argument. However, on the unwind path, there is no way to get gc.statepoint directly because the return value of the InvokeInst is undefined there. In this scenario, we tie gc.relocate to the landingpad, and use the landingpad to find its unique predecessor to get the corresponding gc.statepoint. We pick the selector value from the landingpad to feed into gc.relocate just because it has the same type (i32) as gc.statepoint's return type. The actual value of the selector doesn’t really matter because gc.relocate only uses it as a reference to find gc.statepoint and not consume it during lowering. >> >> However, this will no longer work if we change gc.statepoint's return type to token type. To make it work, I could see two potential approaches. 1) add support of token type inside struct type so that we can have a landingpad with result type of { i8*, token }, or 2) add support of landingpad with a token result type. Approach 1 seems to be easier since all the other parts of statepoint handling does not need to be changed at all, and having a selector of token type also seems reasonable (furthermore, we don’t ever need to extract selector value to do exception handling in our code base so I think only supporting token type in struct should be enough for us). Approach 2 requires to modify the way how gc.relocate looks up for its corresponding gc.statepoint through landingpad, but shouldn’t be hard either. >> >> Does either of the approaches sound reasonable to you? Other ideas are also welcomed :) >> >> Thank you very much! >> >> Best, >> Chen-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20151203/5ba59aa0/attachment.html>
Joseph Tremoulet via llvm-dev
2015-Dec-04 21:27 UTC
[llvm-dev] Support token type in struct for landingpad
> I dont have a concrete design right now and I am happy to take any other ideasThree ideas come to mind, none of which are perfect: 1) I'm tempted to say that now that we have token type, landingpad should generally produce a token, the pointer should be extracted with the @llvm.eh.exceptionpointer intrinsic instead of an extractvalue, and the selector should likewise be extracted with a new @llvm.eh.exceptionselector intrinsic instead of extractvalue (and personalities that communicate other things via their landingpads would need to add similar intrinsics to extract them, like the @llvm.eh.exceptioncode intrinsic that SEH uses). But that would require updating all the front-ends generating landingpads, and be awkward for any target personality routines that literally do pass a struct to the landing pad (are there any?), and so probably just reflects my bias coming from dealing with a personality using catchpads/cleanuppads instead of landingpads. 2) Since you're not actually using the landingpad's exception selector nor, if I understand " This is enough to support the gc.statepoint work " correctly, its exception pointer, it's possible that TLI.getExceptionPointerRegister and TLI.getExceptionSelectorRegister should be returning NoRegister for your personality. That would require modifying the EHPersonality enum and corresponding string matching in Analysis/EHPersonalities.h to recognize your personality, but I think that would be fine (it highlights a potential scaling issue if we add lots of targets that each need this, but that's a somewhat independent and pre-existing issue, and in reality I doubt you'd be opening a floodgate here). 3) Maybe the default should be switched, so that TLI.getExceptionPointerRegister and TLI.getExceptionSelectorRegister return NoRegister for EHPersonality::Unknown, and only return actual registers for personalities they recognized. This would require any targets using landingpads with exception pointers / exception selectors to update their code and add themselves to Analysis/EHPersonalities.h, similar to how #2 would require adding your personality, so it seems more disruptive if conceptually a touch cleaner than #2. 4) Explicitly checking for token type in visitLandingPad as you suggest sounds okay to me as a pragmatic approach, too. I'd probably lean toward #2 as being the least disruptive and most explicit/straightforward about the personality's expectations, but I'm curious what others think. Thanks -Joseph From: Chen Li [mailto:meloli87 at gmail.com] Sent: Thursday, December 3, 2015 4:06 PM To: David Majnemer <david.majnemer at gmail.com> Cc: Igor Laevsky <igor at azulsystems.com>; llvm-dev <llvm-dev at lists.llvm.org>; Joseph Tremoulet <jotrem at microsoft.com> Subject: Re: Support token type in struct for landingpad Hi David and Joseph, I’ve just added landingpad with token type locally and changed gc.relocate to work in the following way: %0 = invoke token (i64, i32, void (i64 addrspace(1)*)*, i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidp1i64f(i64 0, i32 0, void (i64 addrspace(1)*)* @some_call, i32 1, i32 0, i64 addrspace(1)* %obj, i32 0, i32 5, i32 0, i32 -1, i32 0, i32 0, i32 0, i64 addrspace(1)* %obj, i64 addrspace(1)* %obj1) to label %invoke_safepoint_normal_dest unwind label %exceptional_return invoke_safepoint_normal_dest: ... exceptional_return: %landing_pad = landingpad token cleanup %obj.relocated1 = call coldcc i64 addrspace(1)* @llvm.experimental.gc.relocate.p1i64(token %landing_pad, i32 13, i32 13) %obj1.relocated1 = call coldcc i64 addrspace(1)* @llvm.experimental.gc.relocate.p1i64(token %landing_pad, i32 14, i32 14) ret i64 addrspace(1)* %obj1.relocated1 Now gc.statepoint return a token type instead of i32 type, and gc.relocate also takes a token type as its first argument (the first argument should either be the corresponding gc.statepoint for call statepoint or invoke statepoint on the normal path, or a reference that could help find the corresponding gc.statepoint on the unwind the path). And since landingpad produces a token type here as well, it can be passed as the reference to the gc.relocate’s first argument. To make this work, I have changed two parts of the code. First is how gc.relocate looks up for its corresponding gc.statepoint on the unwind path. It used to use the extracted selector value to find the landingpad and then use the landingpad to find the invoke instruction, which is the gc.statepoint. Now, it can use the landingpad directly to find the invoke instruction. The second part is to make landingpad work with token type. In LLVM’s front end (passes before SelectionDAG), there is no restrictions on what type a landingpad should have (there are test cases in LLVM that has landingpad of i8 or i32 type). However, in SelectionDAGBuilder::visitLandingPad, it is enforced that landingpad must be two-valued (type of { i8*, i32 }), in which way it can handle the exception pointer and selector value inside it. As the first step, I’d like to just add a check to see if the landingpad is of token type, and if so stop it and don’t bother to create the DAG nodes for the exception pointer and selector value (same as what happens during SjLj exceptions). This is enough to support the gc.statepoint work but will not support for C++ style exception handling with gc.statepoint. As for follow-up work, I’d like to add some support to extract selector value from token landingpad. I think we could either do it explicitly in IR (maybe add a intrinsic call extract.selector or something similar) or implicitly during SelectionDAG (in visitLandingPad, check if it’s token type, and if so add an additional transform to extract the exception pointer and selector value from the token). I dont have a concrete design right now and I am happy to take any other ideas. My plan is to get the first step checked in and incrementally work on the follow-up work. Does that sound a reasonable approach to you guys? thanks, chen On Dec 2, 2015, at 9:47 AM, Chen Li <meloli87 at gmail.com<mailto:meloli87 at gmail.com>> wrote: On Dec 1, 2015, at 11:14 PM, David Majnemer <david.majnemer at gmail.com<mailto:david.majnemer at gmail.com>> wrote: While we support 'opaque' types nested within struct types, they are not exactly battle tested: $ cat t.ll %opaque_ty = type opaque %struct_ty = type { i32, %opaque_ty } define %struct_ty @f(%struct_ty* %p) { %load = load %struct_ty, %struct_ty* %p ret %struct_ty %load } $ opt -O2 t.ll -S lib/IR/DataLayout.cpp:623: unsigned int llvm::DataLayout::getAlignment(llvm::Type *, bool) const: Assertion `Ty->isSized() && "Cannot getTypeInfo() on a type that is unsized!"' failed. Thanks David! I’ve actually hacked to add token type into struct type and ended up with the same failure as above. I will take a look at the catchpad and cleanuppad code, and create a patch to add token landingpad and have you review it. thanks, chen As a practical matter, I fear nesting 'token' types within struct types will have similar issues. Beyond that, the design philosophy behind 'token' is that it is incredibly opaque and permitting it to nest inside a struct creates scenarios where one might try to GEP to the end of the field right before the token field in an attempt to examine or manipulate the token. Your other recommendation, having landingpad produce a token, is quite similar to how we've designed catchpad and cleanuppad. I think that direction will be quite nice. On Tue, Dec 1, 2015 at 8:07 PM, Chen Li <meloli87 at gmail.com<mailto:meloli87 at gmail.com>> wrote: Hi David, Sorry to bother you, but I would like to get some suggestions on your recent work of token type. I’m currently working on changing gc.statepoint to return a token type instead of a i32 type. The reason is that with the current implementation, gc.statepoint could potentially be fed into PHI nodes, and break RewriteStatepointsForGC pass later. Using token type would help us to avoid this. I have most of the code work but got a problem when gc.statepint is an InvokeInst and has an unwind path. Currently, gc.statepoint of InvokeInst works as below (the code snippet is from test/CodeGen/X86/statepoint-invoke.ll): %0 = invoke i32 (i64, i32, void (i64 addrspace(1)*)*, i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidp1i64f(i64 0, i32 0, void (i64 addrspace(1)*)* @some_call, i32 1, i32 0, i64 addrspace(1)* %obj, i32 0, i32 5, i32 0, i32 -1, i32 0, i32 0, i32 0, i64 addrspace(1)* %obj, i64 addrspace(1)* %obj1) to label %invoke_safepoint_normal_dest unwind label %exceptional_return invoke_safepoint_normal_dest: … exceptional_return: %landing_pad = landingpad { i8*, i32 } cleanup %relocate_token = extractvalue { i8*, i32 } %landing_pad, 1 %obj.relocated1 = call coldcc i64 addrspace(1)* @llvm.experimental.gc.relocate.p1i64(i32 % relocate_token, i32 13, i32 13) %obj1.relocated1 = call coldcc i64 addrspace(1)* @llvm.experimental.gc.relocate.p1i64(i32 % relocate_token, i32 14, i32 14) ret i64 addrspace(1)* %obj1.relocated1 Each gc.relocate needs to take its corresponding gc.statepoint as its first argument. However, on the unwind path, there is no way to get gc.statepoint directly because the return value of the InvokeInst is undefined there. In this scenario, we tie gc.relocate to the landingpad, and use the landingpad to find its unique predecessor to get the corresponding gc.statepoint. We pick the selector value from the landingpad to feed into gc.relocate just because it has the same type (i32) as gc.statepoint's return type. The actual value of the selector doesn’t really matter because gc.relocate only uses it as a reference to find gc.statepoint and not consume it during lowering. However, this will no longer work if we change gc.statepoint's return type to token type. To make it work, I could see two potential approaches. 1) add support of token type inside struct type so that we can have a landingpad with result type of { i8*, token }, or 2) add support of landingpad with a token result type. Approach 1 seems to be easier since all the other parts of statepoint handling does not need to be changed at all, and having a selector of token type also seems reasonable (furthermore, we don’t ever need to extract selector value to do exception handling in our code base so I think only supporting token type in struct should be enough for us). Approach 2 requires to modify the way how gc.relocate looks up for its corresponding gc.statepoint through landingpad, but shouldn’t be hard either. Does either of the approaches sound reasonable to you? Other ideas are also welcomed :) Thank you very much! Best, Chen -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20151204/d084d66b/attachment.html>