thr3ads.net - llvm dev - [llvm-dev] Memory scope proposal [Aug 2016]

If this information is useful, please help other people find it:
Share via:

Tye, Tony via llvm-dev

2016-Aug-23 21:28 UTC

[llvm-dev] Memory scope proposal

> Since the scope is “opaque” and target specific, can you elaborate what
kind of generic optimization can be performed?
Some optimizations that are related to a single thread could be done without
needing to know the actual memory scope. For example, an atomic acquire can
restrict reordering memory operations after it, but allow reordering of memory
operations (except another atomic acquire) before it, regardless of the memory
scope.

Thanks,
-Tony
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160823/be78b51c/attachment-0001.html>

Sanjoy Das via llvm-dev

2016-Aug-31 00:53 UTC

head link

[llvm-dev] Memory scope proposal

Hi,

[Sorry for chiming in so late.]

I understand why a straightforward metadata scheme won't work here,
but have you considered an alternate scheme that works in the
following way:

   - We add a MD node called !nosynch that lists a set of "domains" a
     certain memory operation does *not* synchronize with.

   - Memory operations with !nosynch synchronize with memory operations
     without any !nosynch metadata (so dropping !nosynch is safe).

This will only work if your frontend knows, ahead of time, what the
possible set of synch-domains are, but it presumably already knows
that (otherwise how do you map domain names to integers)?

The other disadvantage with the scheme above is that memory operations
on the "normal CPU heap" (pardon my GPU n00b-ness here :) ) will synch
with the memory operations with !nosynch metadata.  However, we can
solve that by modeling the "normal CPU heap" as "!nosynch
!{!special_domain_a, !special_domain_b, ... all domains except
!cpu_heap_domain}".

Thanks,
-- Sanjoy

Sanjoy Das via llvm-dev

2016-Aug-31 01:56 UTC

head link

[llvm-dev] Memory scope proposal

Hi,

Sanjoy Das wrote:
 > I understand why a straightforward metadata scheme won't work here,
 > but have you considered an alternate scheme that works in the
 > following way:
 >
 > - We add a MD node called !nosynch that lists a set of "domains"
a
 > certain memory operation does *not* synchronize with.
 >
 > - Memory operations with !nosynch synchronize with memory operations
 > without any !nosynch metadata (so dropping !nosynch is safe).

I missed a spot here ^, !nosynch metadata will also have to have a
sub-node for the kind of synch-domain *it* is in.  The synchs-with
relation is then:

bool SynchsWith(MemOp A, MemOp B) {
   MD_A = A.getMD(MD_nosynch);
   MD_B = B.getMD(MD_nosynch);
   if (!MD_A || !MD_B)
     return true;
   return MD_B.nosync_list.contains(MD_A.id) || 
MD_A.nosync_list.contains(MD_B.id);
}

I'm still not a 100% convinced that the above works, but I think there
are advantages to expressing synch scopes as metadata.  For instance,
the optimizer already "knows" what to do with the metadata on loads it
speculates.

-- Sanjoy

Mehdi Amini via llvm-dev

2016-Aug-31 15:23 UTC

head link

[llvm-dev] Memory scope proposal

> On Aug 30, 2016, at 5:53 PM, Sanjoy Das <sanjoy at
playingwithpointers.com> wrote:
> 
> Hi,
> 
> [Sorry for chiming in so late.]
> 
> I understand why a straightforward metadata scheme won't work here,
> but have you considered an alternate scheme that works in the
> following way:
> 
>  - We add a MD node called !nosynch that lists a set of "domains"
a
>    certain memory operation does *not* synchronize with.
> 
>  - Memory operations with !nosynch synchronize with memory operations
>    without any !nosynch metadata (so dropping !nosynch is safe).
I’m not sure, but isn’t the synchscope id (or domains as you seem to call it)
intended to change which instruction would be actually codegen?
In which case I’m not sure dropping it is ever a good idea, even when it does
not affect correctness it would dramatically affect performance.

— 
Mehdi


> 
> This will only work if your frontend knows, ahead of time, what the
> possible set of synch-domains are, but it presumably already knows
> that (otherwise how do you map domain names to integers)?
> 
> The other disadvantage with the scheme above is that memory operations
> on the "normal CPU heap" (pardon my GPU n00b-ness here :) ) will
synch
> with the memory operations with !nosynch metadata.  However, we can
> solve that by modeling the "normal CPU heap" as "!nosynch
> !{!special_domain_a, !special_domain_b, ... all domains except
> !cpu_heap_domain}".
> 
> Thanks,
> -- Sanjoy

Justin Lebar via llvm-dev

2016-Aug-31 19:23 UTC

head link

[llvm-dev] Memory scope proposal

> Some optimizations that are related to a single thread could be done
without needing to know the actual memory scope.
Right, it's clear to me that there exist optimizations that you cannot
do if we model these ops as target-specific intrinsics.

But what I think Mehdi and I were trying to get at is: How much of a
problem is this in practice?  Are there real-world programs that
suffer because we miss these optimizations?  If so, how much?

The reason I'm asking this question is, there's a real cost to adding
complexity in LLVM.  Everyone in the project is going to pay that
cost, forever (or at least, until we remove the feature :).  So I want
to try to evaluate whether paying that cost is actually worth while,
as compared to the simple alternative (i.e., intrinsics).  Given the
tepid response to this proposal, I'm sort of thinking that now may not
be the time to start paying this cost.  (We can always revisit this in
the future.)  But I remain open to being convinced.

As a point of comparison, we have a rule of thumb that we'll add an
optimization that increases compilation time by x% if we have a
benchmark that is sped up by at least x%.  Similarly here, I'd want to
weigh the added complexity against the improvements to user code.

-Justin

On Tue, Aug 23, 2016 at 2:28 PM, Tye, Tony via llvm-dev
<llvm-dev at lists.llvm.org> wrote:>> Since the scope is “opaque” and target specific, can you elaborate what
>> kind of generic optimization can be performed?
>
>
>
> Some optimizations that are related to a single thread could be done
without
> needing to know the actual memory scope. For example, an atomic acquire can
> restrict reordering memory operations after it, but allow reordering of
> memory operations (except another atomic acquire) before it, regardless of
> the memory scope.
>
>
>
> Thanks,
>
> -Tony
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>

Tom Stellard via llvm-dev

2016-Sep-01 15:52 UTC

head link

[llvm-dev] Memory scope proposal

On Wed, Aug 31, 2016 at 12:23:34PM -0700, Justin Lebar via llvm-dev
wrote:> > Some optimizations that are related to a single thread could be done
without needing to know the actual memory scope.
> 
> Right, it's clear to me that there exist optimizations that you cannot
> do if we model these ops as target-specific intrinsics.
> 
> But what I think Mehdi and I were trying to get at is: How much of a
> problem is this in practice?  Are there real-world programs that
> suffer because we miss these optimizations?  If so, how much?
> 
> The reason I'm asking this question is, there's a real cost to
adding
> complexity in LLVM.  Everyone in the project is going to pay that
> cost, forever (or at least, until we remove the feature :).  So I want
> to try to evaluate whether paying that cost is actually worth while,
> as compared to the simple alternative (i.e., intrinsics).  Given the
> tepid response to this proposal, I'm sort of thinking that now may not
> be the time to start paying this cost.  (We can always revisit this in
> the future.)  But I remain open to being convinced.
> 
I think the cost of adding this information to the IR is really low.
There is already a sychronization scope field present for LLVM atomic
instructions, and it is already being encoded as 32-bits, so it is
possible to represent the additional scopes using the existing bitcode
format.  Optimization passes are already aware of this synchronization
scope field, so they know how to preserve it when transforming the IR.

The primary goal here is to pass synchronization scope information from
the fronted to the backend.  We already have a mechanism for doing this,
so why not use it?  That seems like the lowest cost option to me.

-Tom
> As a point of comparison, we have a rule of thumb that we'll add an
> optimization that increases compilation time by x% if we have a
> benchmark that is sped up by at least x%.  Similarly here, I'd want to
> weigh the added complexity against the improvements to user code.
> 
> -Justin
> 
> On Tue, Aug 23, 2016 at 2:28 PM, Tye, Tony via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
> >> Since the scope is “opaque” and target specific, can you elaborate
what
> >> kind of generic optimization can be performed?
> >
> >
> >
> > Some optimizations that are related to a single thread could be done
without
> > needing to know the actual memory scope. For example, an atomic
acquire can
> > restrict reordering memory operations after it, but allow reordering
of
> > memory operations (except another atomic acquire) before it,
regardless of
> > the memory scope.
> >
> >
> >
> > Thanks,
> >
> > -Tony
> >
> >
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Possibly Parallel Threads

Search for more reasonably related threads

llvm dev - Aug 2016 - Memory scope proposal

[llvm-dev] Memory scope proposal

[llvm-dev] Memory scope proposal

[llvm-dev] Memory scope proposal

[llvm-dev] Memory scope proposal

[llvm-dev] Memory scope proposal

[llvm-dev] Memory scope proposal

Possibly Parallel Threads