Hi Michael,
After some head-scratching and discussion with our tame Khronos member,
I agree with you.
It comes down to the interpretation of the ambiguous spec. It refers to
"the barrier", implying there is some sort of equivalence relation
over
barriers. The question is, what is that equivalent relation? In your
example code:
> void f(int foo) {
> if (foo)
> b();
> else
> b();
> }
>
> void b() {
> barrier();
> }
>
After talking with others, I now see that the spirit of the spec is that
the barrier() as reached through the True condition is not the same as
the barrier as reached through the False condition, even though it is
the same statement as written in CL-C.
So inlining should not be restricted, as you say.
With regards transitivity, I'd say that it is the user's responsibility
to ensure that the noduplicate attribute is transitively applied
correctly before handing it to the optimizer. Otherwise, we'd need
costly checks every time a user asks the question "is this function
noduplicate?"
So my proposed change is to remove the restrictions on inlining and
update the LangRef to say that the attribute should be transitive but it
is the IR producer's responsibility to ensure that.
Does that sound about right?
Cheers,
James
On Thu, 2012-12-06 at 07:47 +0000, Kuperstein, Michael M
wrote:> I'm not sure I agree with the semantics this patch creates.
>
> The way I see it, there are two options:
> 1) Make "noduplicate" non-transitive and forbid inlining when
there are multiple callsites.
> 2) Allow inlining, but make the attribute transitive.
>
> Consider the following code, where barrier() is marked noduplicate.
>
> kernel void k() {
> if (x())
> y();
> b();
> if (x())
> z();
> else
> w();
> }
>
> void b() {
> barrier();
> }
>
> Both inlining and threading b() is clearly illegal. The question is - which
of the two should be forbidden. The current patch, assuming I understand it
correctly, will:
> i) allow inlining but forbid threading after inlining. This is clearly
fine.
> ii) allow threading but forbid inlining after threading.
> I am not at all sure option (ii) is desirable.
>
> On the other hand, I believe we'd like to allow inlining in the
following case:
>
> void f(int foo) {
> if (foo)
> b();
> else
> b();
> }
>
> void b() {
> barrier();
> }
>
> Basically, the issue here is a bit philosophical - is an instruction
reached through code paths that go through different callsites considered the
"same" instruction or not.
> If it is the same, then we cannot inline multiple call-sites, but don't
need to make noduplicate transitive, since if you have a() -> b() ->
barrier(), it doesn't matter what you do with calls to a(), it's still
the same barrier().
> If it isn't, inlining is always allowed (I think - James, do you have
an example that breaks?), but the attribute must be transitive.
>
> I feel the right thing(TM) is to consider these instructions different,
which leads to "transitive, inlining allowed", as opposed to the
"non-transitive, no inlining" approach that the patch takes.
>
> -----Original Message-----
> From: James Molloy [mailto:James.Molloy at arm.com]
> Sent: Tuesday, December 04, 2012 19:23
> To: Chris Lattner; llvm-commits
> Cc: Nadav Rotem; Kuperstein, Michael M; llvmdev at cs.uiuc.edu
> Subject: Re: [LLVMdev] [RFC] "noclone" function attribute
>
> Hi all + llvm-commits,
>
> After the discussion below, please find attached my patch to add a new
"noduplicate" function attribute.
>
> I've modified CodeMetrics and LoopInfo, which covers most cases, but
JumpThreading and InlineCost don't use CodeMetrics yet, so they required
changing manually.
>
> Cheers,
>
> James
>
> On Mon, 2012-12-03 at 23:46 +0000, Chris Lattner wrote:
> > On Dec 3, 2012, at 9:48 AM, James Molloy <James.Molloy at
arm.com> wrote:
> >
> > > Hi,
> > >
> > > Thanks for the pointers. My patch now calls the attribute
> > > "noduplicate", and updates CodeMetrics to have another
field:
> > >
> > > bool notDuplicatable;
> > >
> > > Which semantically is "containsIndirectBr ||
> > > containsNoDuplicateInst". I didn't repurpose
containsIndirectBr
> > > because I felt what I'm looking for is sufficiently different
> > > (indirectbr inhibits inlining, whereas noduplicate does not, if
there is one call site).
> >
> > I'm pretty sure that it's fine to inline indirectbr if there
is a single call site and the inlinee is to be deleted.
> >
> > -Chris
> >
> >
>
>
> -- IMPORTANT NOTICE: The contents of this email and any attachments are
confidential and may also be privileged. If you are not the intended recipient,
please notify the sender immediately and do not disclose the contents to any
other person, use it for any purpose, or store or copy the information in any
medium. Thank you.
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
-- IMPORTANT NOTICE: The contents of this email and any attachments are
confidential and may also be privileged. If you are not the intended recipient,
please notify the sender immediately and do not disclose the contents to any
other person, use it for any purpose, or store or copy the information in any
medium. Thank you.