Hi Michael, After some head-scratching and discussion with our tame Khronos member, I agree with you. It comes down to the interpretation of the ambiguous spec. It refers to "the barrier", implying there is some sort of equivalence relation over barriers. The question is, what is that equivalent relation? In your example code:> void f(int foo) { > if (foo) > b(); > else > b(); > } > > void b() { > barrier(); > } >After talking with others, I now see that the spirit of the spec is that the barrier() as reached through the True condition is not the same as the barrier as reached through the False condition, even though it is the same statement as written in CL-C. So inlining should not be restricted, as you say. With regards transitivity, I'd say that it is the user's responsibility to ensure that the noduplicate attribute is transitively applied correctly before handing it to the optimizer. Otherwise, we'd need costly checks every time a user asks the question "is this function noduplicate?" So my proposed change is to remove the restrictions on inlining and update the LangRef to say that the attribute should be transitive but it is the IR producer's responsibility to ensure that. Does that sound about right? Cheers, James On Thu, 2012-12-06 at 07:47 +0000, Kuperstein, Michael M wrote:> I'm not sure I agree with the semantics this patch creates. > > The way I see it, there are two options: > 1) Make "noduplicate" non-transitive and forbid inlining when there are multiple callsites. > 2) Allow inlining, but make the attribute transitive. > > Consider the following code, where barrier() is marked noduplicate. > > kernel void k() { > if (x()) > y(); > b(); > if (x()) > z(); > else > w(); > } > > void b() { > barrier(); > } > > Both inlining and threading b() is clearly illegal. The question is - which of the two should be forbidden. The current patch, assuming I understand it correctly, will: > i) allow inlining but forbid threading after inlining. This is clearly fine. > ii) allow threading but forbid inlining after threading. > I am not at all sure option (ii) is desirable. > > On the other hand, I believe we'd like to allow inlining in the following case: > > void f(int foo) { > if (foo) > b(); > else > b(); > } > > void b() { > barrier(); > } > > Basically, the issue here is a bit philosophical - is an instruction reached through code paths that go through different callsites considered the "same" instruction or not. > If it is the same, then we cannot inline multiple call-sites, but don't need to make noduplicate transitive, since if you have a() -> b() -> barrier(), it doesn't matter what you do with calls to a(), it's still the same barrier(). > If it isn't, inlining is always allowed (I think - James, do you have an example that breaks?), but the attribute must be transitive. > > I feel the right thing(TM) is to consider these instructions different, which leads to "transitive, inlining allowed", as opposed to the "non-transitive, no inlining" approach that the patch takes. > > -----Original Message----- > From: James Molloy [mailto:James.Molloy at arm.com] > Sent: Tuesday, December 04, 2012 19:23 > To: Chris Lattner; llvm-commits > Cc: Nadav Rotem; Kuperstein, Michael M; llvmdev at cs.uiuc.edu > Subject: Re: [LLVMdev] [RFC] "noclone" function attribute > > Hi all + llvm-commits, > > After the discussion below, please find attached my patch to add a new "noduplicate" function attribute. > > I've modified CodeMetrics and LoopInfo, which covers most cases, but JumpThreading and InlineCost don't use CodeMetrics yet, so they required changing manually. > > Cheers, > > James > > On Mon, 2012-12-03 at 23:46 +0000, Chris Lattner wrote: > > On Dec 3, 2012, at 9:48 AM, James Molloy <James.Molloy at arm.com> wrote: > > > > > Hi, > > > > > > Thanks for the pointers. My patch now calls the attribute > > > "noduplicate", and updates CodeMetrics to have another field: > > > > > > bool notDuplicatable; > > > > > > Which semantically is "containsIndirectBr || > > > containsNoDuplicateInst". I didn't repurpose containsIndirectBr > > > because I felt what I'm looking for is sufficiently different > > > (indirectbr inhibits inlining, whereas noduplicate does not, if there is one call site). > > > > I'm pretty sure that it's fine to inline indirectbr if there is a single call site and the inlinee is to be deleted. > > > > -Chris > > > > > > > -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. > --------------------------------------------------------------------- > Intel Israel (74) Limited > > This e-mail and any attachments may contain confidential material for > the sole use of the intended recipient(s). Any review or distribution > by others is strictly prohibited. If you are not the intended > recipient, please contact the sender and delete all copies.-- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Sounds good to me. I'm not sure the solution for transitivity is optimal, but it's a good compromise. -----Original Message----- From: James Molloy [mailto:James.Molloy at arm.com] Sent: Thursday, December 06, 2012 13:05 To: Kuperstein, Michael M Cc: Chris Lattner; llvm-commits; Nadav Rotem; llvmdev at cs.uiuc.edu Subject: RE: [LLVMdev] [RFC] "noclone" function attribute Hi Michael, After some head-scratching and discussion with our tame Khronos member, I agree with you. It comes down to the interpretation of the ambiguous spec. It refers to "the barrier", implying there is some sort of equivalence relation over barriers. The question is, what is that equivalent relation? In your example code:> void f(int foo) { > if (foo) > b(); > else > b(); > } > > void b() { > barrier(); > } >After talking with others, I now see that the spirit of the spec is that the barrier() as reached through the True condition is not the same as the barrier as reached through the False condition, even though it is the same statement as written in CL-C. So inlining should not be restricted, as you say. With regards transitivity, I'd say that it is the user's responsibility to ensure that the noduplicate attribute is transitively applied correctly before handing it to the optimizer. Otherwise, we'd need costly checks every time a user asks the question "is this function noduplicate?" So my proposed change is to remove the restrictions on inlining and update the LangRef to say that the attribute should be transitive but it is the IR producer's responsibility to ensure that. Does that sound about right? Cheers, James On Thu, 2012-12-06 at 07:47 +0000, Kuperstein, Michael M wrote:> I'm not sure I agree with the semantics this patch creates. > > The way I see it, there are two options: > 1) Make "noduplicate" non-transitive and forbid inlining when there are multiple callsites. > 2) Allow inlining, but make the attribute transitive. > > Consider the following code, where barrier() is marked noduplicate. > > kernel void k() { > if (x()) > y(); > b(); > if (x()) > z(); > else > w(); > } > > void b() { > barrier(); > } > > Both inlining and threading b() is clearly illegal. The question is - which of the two should be forbidden. The current patch, assuming I understand it correctly, will: > i) allow inlining but forbid threading after inlining. This is clearly fine. > ii) allow threading but forbid inlining after threading. > I am not at all sure option (ii) is desirable. > > On the other hand, I believe we'd like to allow inlining in the following case: > > void f(int foo) { > if (foo) > b(); > else > b(); > } > > void b() { > barrier(); > } > > Basically, the issue here is a bit philosophical - is an instruction reached through code paths that go through different callsites considered the "same" instruction or not. > If it is the same, then we cannot inline multiple call-sites, but don't need to make noduplicate transitive, since if you have a() -> b() -> barrier(), it doesn't matter what you do with calls to a(), it's still the same barrier(). > If it isn't, inlining is always allowed (I think - James, do you have an example that breaks?), but the attribute must be transitive. > > I feel the right thing(TM) is to consider these instructions different, which leads to "transitive, inlining allowed", as opposed to the "non-transitive, no inlining" approach that the patch takes. > > -----Original Message----- > From: James Molloy [mailto:James.Molloy at arm.com] > Sent: Tuesday, December 04, 2012 19:23 > To: Chris Lattner; llvm-commits > Cc: Nadav Rotem; Kuperstein, Michael M; llvmdev at cs.uiuc.edu > Subject: Re: [LLVMdev] [RFC] "noclone" function attribute > > Hi all + llvm-commits, > > After the discussion below, please find attached my patch to add a new "noduplicate" function attribute. > > I've modified CodeMetrics and LoopInfo, which covers most cases, but JumpThreading and InlineCost don't use CodeMetrics yet, so they required changing manually. > > Cheers, > > James > > On Mon, 2012-12-03 at 23:46 +0000, Chris Lattner wrote: > > On Dec 3, 2012, at 9:48 AM, James Molloy <James.Molloy at arm.com> wrote: > > > > > Hi, > > > > > > Thanks for the pointers. My patch now calls the attribute > > > "noduplicate", and updates CodeMetrics to have another field: > > > > > > bool notDuplicatable; > > > > > > Which semantically is "containsIndirectBr || > > > containsNoDuplicateInst". I didn't repurpose containsIndirectBr > > > because I felt what I'm looking for is sufficiently different > > > (indirectbr inhibits inlining, whereas noduplicate does not, if there is one call site). > > > > I'm pretty sure that it's fine to inline indirectbr if there is a single call site and the inlinee is to be deleted. > > > > -Chris > > > > > > > -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. > --------------------------------------------------------------------- > Intel Israel (74) Limited > > This e-mail and any attachments may contain confidential material for > the sole use of the intended recipient(s). Any review or distribution > by others is strictly prohibited. If you are not the intended > recipient, please contact the sender and delete all copies.-- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. --------------------------------------------------------------------- Intel Israel (74) Limited This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.
Hi, Changes made: * The inliner is no longer affected whatsoever. * The verifier will now ensure that the transitive closure of the callers of a noduplicate function all are noduplicate too. * LangRef updated. * Verifier test added. Please review! Cheers, James On Fri, 2012-12-07 at 17:40 +0000, Kuperstein, Michael M wrote:> Sounds good to me. > I'm not sure the solution for transitivity is optimal, but it's a good compromise. > > -----Original Message----- > From: James Molloy [mailto:James.Molloy at arm.com] > Sent: Thursday, December 06, 2012 13:05 > To: Kuperstein, Michael M > Cc: Chris Lattner; llvm-commits; Nadav Rotem; llvmdev at cs.uiuc.edu > Subject: RE: [LLVMdev] [RFC] "noclone" function attribute > > Hi Michael, > > After some head-scratching and discussion with our tame Khronos member, I agree with you. > > It comes down to the interpretation of the ambiguous spec. It refers to "the barrier", implying there is some sort of equivalence relation over barriers. The question is, what is that equivalent relation? In your example code: > > > void f(int foo) { > > if (foo) > > b(); > > else > > b(); > > } > > > > void b() { > > barrier(); > > } > > > > After talking with others, I now see that the spirit of the spec is that the barrier() as reached through the True condition is not the same as the barrier as reached through the False condition, even though it is the same statement as written in CL-C. > > So inlining should not be restricted, as you say. > > With regards transitivity, I'd say that it is the user's responsibility to ensure that the noduplicate attribute is transitively applied correctly before handing it to the optimizer. Otherwise, we'd need costly checks every time a user asks the question "is this function noduplicate?" > > So my proposed change is to remove the restrictions on inlining and update the LangRef to say that the attribute should be transitive but it is the IR producer's responsibility to ensure that. > > Does that sound about right? > > Cheers, > > James > > On Thu, 2012-12-06 at 07:47 +0000, Kuperstein, Michael M wrote: > > I'm not sure I agree with the semantics this patch creates. > > > > The way I see it, there are two options: > > 1) Make "noduplicate" non-transitive and forbid inlining when there are multiple callsites. > > 2) Allow inlining, but make the attribute transitive. > > > > Consider the following code, where barrier() is marked noduplicate. > > > > kernel void k() { > > if (x()) > > y(); > > b(); > > if (x()) > > z(); > > else > > w(); > > } > > > > void b() { > > barrier(); > > } > > > > Both inlining and threading b() is clearly illegal. The question is - which of the two should be forbidden. The current patch, assuming I understand it correctly, will: > > i) allow inlining but forbid threading after inlining. This is clearly fine. > > ii) allow threading but forbid inlining after threading. > > I am not at all sure option (ii) is desirable. > > > > On the other hand, I believe we'd like to allow inlining in the following case: > > > > void f(int foo) { > > if (foo) > > b(); > > else > > b(); > > } > > > > void b() { > > barrier(); > > } > > > > Basically, the issue here is a bit philosophical - is an instruction reached through code paths that go through different callsites considered the "same" instruction or not. > > If it is the same, then we cannot inline multiple call-sites, but don't need to make noduplicate transitive, since if you have a() -> b() -> barrier(), it doesn't matter what you do with calls to a(), it's still the same barrier(). > > If it isn't, inlining is always allowed (I think - James, do you have an example that breaks?), but the attribute must be transitive. > > > > I feel the right thing(TM) is to consider these instructions different, which leads to "transitive, inlining allowed", as opposed to the "non-transitive, no inlining" approach that the patch takes. > > > > -----Original Message----- > > From: James Molloy [mailto:James.Molloy at arm.com] > > Sent: Tuesday, December 04, 2012 19:23 > > To: Chris Lattner; llvm-commits > > Cc: Nadav Rotem; Kuperstein, Michael M; llvmdev at cs.uiuc.edu > > Subject: Re: [LLVMdev] [RFC] "noclone" function attribute > > > > Hi all + llvm-commits, > > > > After the discussion below, please find attached my patch to add a new "noduplicate" function attribute. > > > > I've modified CodeMetrics and LoopInfo, which covers most cases, but JumpThreading and InlineCost don't use CodeMetrics yet, so they required changing manually. > > > > Cheers, > > > > James > > > > On Mon, 2012-12-03 at 23:46 +0000, Chris Lattner wrote: > > > On Dec 3, 2012, at 9:48 AM, James Molloy <James.Molloy at arm.com> wrote: > > > > > > > Hi, > > > > > > > > Thanks for the pointers. My patch now calls the attribute > > > > "noduplicate", and updates CodeMetrics to have another field: > > > > > > > > bool notDuplicatable; > > > > > > > > Which semantically is "containsIndirectBr || > > > > containsNoDuplicateInst". I didn't repurpose containsIndirectBr > > > > because I felt what I'm looking for is sufficiently different > > > > (indirectbr inhibits inlining, whereas noduplicate does not, if there is one call site). > > > > > > I'm pretty sure that it's fine to inline indirectbr if there is a single call site and the inlinee is to be deleted. > > > > > > -Chris > > > > > > > > > > > > -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. > > --------------------------------------------------------------------- > > Intel Israel (74) Limited > > > > This e-mail and any attachments may contain confidential material for > > the sole use of the intended recipient(s). Any review or distribution > > by others is strictly prohibited. If you are not the intended > > recipient, please contact the sender and delete all copies. > > > -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. > --------------------------------------------------------------------- > Intel Israel (74) Limited > > This e-mail and any attachments may contain confidential material for > the sole use of the intended recipient(s). Any review or distribution > by others is strictly prohibited. If you are not the intended > recipient, please contact the sender and delete all copies.-- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. -------------- next part -------------- A non-text attachment was scrubbed... Name: noduplicate.diff Type: text/x-patch Size: 20423 bytes Desc: noduplicate.diff URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20121210/1d5012c5/attachment.bin>