thr3ads.net - llvm dev - [llvm-dev] RFC: Adding a !thread.private metadata [Sep 2018]

If this information is useful, please help other people find it:
Share via:

Philip Reames via llvm-dev

2018-Sep-14 23:13 UTC

[llvm-dev] RFC: Adding a !thread.private metadata

Problem

LLVM's memory model for NonAtomic accesses is generally fairly weak, but 
explicitly disallows inserting stores that didn't occur in the original 
program.  This is required for any potentially shared location, but is 
overkill for any memory location which is provably only accessed by a 
single thread.

My particular motivating example is a single thread private field in our 
implementation, but there are numerous languages which provide thread 
private storage options and right now, LLVM has no good way to represent 
them.

(Just to set expectations appropriately: the example which made me write 
this up is purely a "hey, that's interesting" case at the moment. 
It's
not a major blocking item or anything.  As such, I'm mostly throwing 
this out for discussion because it's interesting.)

Proposed Solution

Add a new metadata type which applies to memory accessing instructions 
(store, load, atomicrmw, etc...) and indicates that the memory location 
accessed is known to be accessed only by a single thread everywhere it 
is dereferenceable.

The framing is very similar to the one we use for !invariant.load and 
for much the same reasons.  If we can prove a location is 
dereferenceable, we want to be able to insert a store along any 
dereferenceable path through the function without worrying whether the 
original location was known to execute or not.  At the moment, the main 
transform to leverage this would be load store promotion in LICM which 
would be taught that inserting a loop exit is legal, even if the store 
didn't execute within the dynamic execution of the loop, if the metadata 
is present.

Alternatives and Discussion

LLVM IR has existing support for thread local storage, but this doesn't 
solve our problem.  There's nothing that presents one thread from 
capturing the address of it's thread local copy and publishing that 
address in a location visible to other threads. Given a thread local 
variable and a nocapture result, we can conclude the location is thread 
private.  (Same for allocas, mallocs, etc...)

As just noted, there are places where we can infer that an access is 
thread private.  I think it makes sense to expose this as an analysis 
utility or pass.  We have bits of this already existing in LICM which 
could be pulled out, renamed, and reused.  There are various other 
transforms we could implement for thread private locations (e.g. replace 
an atomicrmw on a thread private with a load, op, store sequence), but 
I'm not sure these are actually worth implementing at the moment.

We could extend the memory model with a weaker access type.  I think our 
current NotAtomic is a good default, but we could consider adding a 
ThreadPrivate specifier which is weaker than the existing NotAtomic in 
exactly the same way that the metadata implies.  This is a reasonable 
implementation strategy, but might be a bit more work than I can 
practically commit to at the moment.

Hal recently brought up the idea of a nosync function attribute. If I 
understand the intended semantics properly, such functions aren't 
guaranteed to access strictly thread private locations. They're simply 
required not to synchronize; that is, they are allowed to access shared 
variables in a racy manner.

Philip

JF Bastien via llvm-dev

2018-Sep-14 23:22 UTC

head link

[llvm-dev] RFC: Adding a !thread.private metadata

That sounds fine to me. I agree it seems interesting, and kinda low-gain.

It’s an attribute and we’d be able to drop it without losing correctness. We can
drop this functionality if it doesn’t pan out which would be harder if we went
with a new memory order.

I think this should be exposable as a clang attribute in C++ as well. I’m not
saying it’s a good idea, but if you do implement the optimization I’d like to
see what it looks like for users to opt-in to this.

Won’t you be putting this on most allocas because most don’t escape?

Is there a problem with link-once ODR functions using this info differently?

One downside with an attribute: can we annotate “epochs” where a value sometimes
is single-thread, and other times is shared? I don’t think so, but it might be
fine.

> On Sep 14, 2018, at 4:13 PM, Philip Reames <listmail at
philipreames.com> wrote:
> 
> Problem
> 
> LLVM's memory model for NonAtomic accesses is generally fairly weak,
but explicitly disallows inserting stores that didn't occur in the original
program.  This is required for any potentially shared location, but is overkill
for any memory location which is provably only accessed by a single thread.
> 
> My particular motivating example is a single thread private field in our
implementation, but there are numerous languages which provide thread private
storage options and right now, LLVM has no good way to represent them.
> 
> (Just to set expectations appropriately: the example which made me write
this up is purely a "hey, that's interesting" case at the moment. 
It's not a major blocking item or anything.  As such, I'm mostly
throwing this out for discussion because it's interesting.)
> 
> Proposed Solution
> 
> Add a new metadata type which applies to memory accessing instructions
(store, load, atomicrmw, etc...) and indicates that the memory location accessed
is known to be accessed only by a single thread everywhere it is
dereferenceable.
> 
> The framing is very similar to the one we use for !invariant.load and for
much the same reasons.  If we can prove a location is dereferenceable, we want
to be able to insert a store along any dereferenceable path through the function
without worrying whether the original location was known to execute or not.  At
the moment, the main transform to leverage this would be load store promotion in
LICM which would be taught that inserting a loop exit is legal, even if the
store didn't execute within the dynamic execution of the loop, if the
metadata is present.
> 
> Alternatives and Discussion
> 
> LLVM IR has existing support for thread local storage, but this doesn't
solve our problem.  There's nothing that presents one thread from capturing
the address of it's thread local copy and publishing that address in a
location visible to other threads. Given a thread local variable and a nocapture
result, we can conclude the location is thread private.  (Same for allocas,
mallocs, etc...)
> 
> As just noted, there are places where we can infer that an access is thread
private.  I think it makes sense to expose this as an analysis utility or pass. 
We have bits of this already existing in LICM which could be pulled out,
renamed, and reused.  There are various other transforms we could implement for
thread private locations (e.g. replace an atomicrmw on a thread private with a
load, op, store sequence), but I'm not sure these are actually worth
implementing at the moment.
> 
> We could extend the memory model with a weaker access type.  I think our
current NotAtomic is a good default, but we could consider adding a
ThreadPrivate specifier which is weaker than the existing NotAtomic in exactly
the same way that the metadata implies.  This is a reasonable implementation
strategy, but might be a bit more work than I can practically commit to at the
moment.
> 
> Hal recently brought up the idea of a nosync function attribute. If I
understand the intended semantics properly, such functions aren't guaranteed
to access strictly thread private locations. They're simply required not to
synchronize; that is, they are allowed to access shared variables in a racy
manner.
> 
> Philip
> 
> 
>

Philip Reames via llvm-dev

2018-Sep-14 23:33 UTC

head link

[llvm-dev] RFC: Adding a !thread.private metadata

On 09/14/2018 04:22 PM, JF Bastien wrote:> That sounds fine to me. I agree it seems interesting, and kinda low-gain.
>
> It’s an attribute and we’d be able to drop it without losing correctness.
We can drop this functionality if it doesn’t pan out which would be harder if we
went with a new memory order.Just to check, you meant to say "metadata" not "attribute"
right?>
> I think this should be exposable as a clang attribute in C++ as well. I’m
not saying it’s a good idea, but if you do implement the optimization I’d like
to see what it looks like for users to opt-in to this.
I'll leave that part to you.  :)> Won’t you be putting this on most allocas because most don’t escape?I wasn't planning on adding the metadata based on analysis.  I was 
thinking more a utility function along the lines of the following:
bool isKnownThreadPrivateAccess(Instruction *I, ..analysis info...)

Where the implementation would end up using capture tracking for things 
like allocas, but have a fast path return if the instruction itself had 
the metadata.
>
> Is there a problem with link-once ODR functions using this info
differently?Probably.  Derefinement is a real pain, but also an entirely separate 
issue.  :)>
> One downside with an attribute: can we annotate “epochs” where a value
sometimes is single-thread, and other times is shared? I don’t think so, but it
might be fine.I can't come up with a good model for this attribute/metadata wise. At 
least, not one which gives me anything useful from an optimization 
standpoint.

I imagine we would end up with a isKnownThreadPrivateBefore(Instruction 
*I, ...analysis...) variant though.  (Similar to what we have for 
pointer capturing.)

However, relying on such a result is generally really dangerous because 
we don't have a good way to model a publication fence at the moment.  
That's definitely a separate issue, so let's separate that if you
don't
mind.>
>
>> On Sep 14, 2018, at 4:13 PM, Philip Reames <listmail at
philipreames.com> wrote:
>>
>> Problem
>>
>> LLVM's memory model for NonAtomic accesses is generally fairly
weak, but explicitly disallows inserting stores that didn't occur in the
original program.  This is required for any potentially shared location, but is
overkill for any memory location which is provably only accessed by a single
thread.
>>
>> My particular motivating example is a single thread private field in
our implementation, but there are numerous languages which provide thread
private storage options and right now, LLVM has no good way to represent them.
>>
>> (Just to set expectations appropriately: the example which made me
write this up is purely a "hey, that's interesting" case at the
moment.  It's not a major blocking item or anything.  As such, I'm
mostly throwing this out for discussion because it's interesting.)
>>
>> Proposed Solution
>>
>> Add a new metadata type which applies to memory accessing instructions
(store, load, atomicrmw, etc...) and indicates that the memory location accessed
is known to be accessed only by a single thread everywhere it is
dereferenceable.
>>
>> The framing is very similar to the one we use for !invariant.load and
for much the same reasons.  If we can prove a location is dereferenceable, we
want to be able to insert a store along any dereferenceable path through the
function without worrying whether the original location was known to execute or
not.  At the moment, the main transform to leverage this would be load store
promotion in LICM which would be taught that inserting a loop exit is legal,
even if the store didn't execute within the dynamic execution of the loop,
if the metadata is present.
>>
>> Alternatives and Discussion
>>
>> LLVM IR has existing support for thread local storage, but this
doesn't solve our problem.  There's nothing that presents one thread
from capturing the address of it's thread local copy and publishing that
address in a location visible to other threads. Given a thread local variable
and a nocapture result, we can conclude the location is thread private.  (Same
for allocas, mallocs, etc...)
>>
>> As just noted, there are places where we can infer that an access is
thread private.  I think it makes sense to expose this as an analysis utility or
pass.  We have bits of this already existing in LICM which could be pulled out,
renamed, and reused.  There are various other transforms we could implement for
thread private locations (e.g. replace an atomicrmw on a thread private with a
load, op, store sequence), but I'm not sure these are actually worth
implementing at the moment.
>>
>> We could extend the memory model with a weaker access type.  I think
our current NotAtomic is a good default, but we could consider adding a
ThreadPrivate specifier which is weaker than the existing NotAtomic in exactly
the same way that the metadata implies.  This is a reasonable implementation
strategy, but might be a bit more work than I can practically commit to at the
moment.
>>
>> Hal recently brought up the idea of a nosync function attribute. If I
understand the intended semantics properly, such functions aren't guaranteed
to access strictly thread private locations. They're simply required not to
synchronize; that is, they are allowed to access shared variables in a racy
manner.
>>
>> Philip
>>
>>
>>

Nicolai Hähnle via llvm-dev

2018-Sep-16 12:03 UTC

head link

[llvm-dev] RFC: Adding a !thread.private metadata

On 15.09.2018 01:13, Philip Reames via llvm-dev wrote:> Add a new metadata type which applies to memory accessing instructions 
> (store, load, atomicrmw, etc...) and indicates that the memory location 
> accessed is known to be accessed only by a single thread everywhere it 
> is dereferenceable.
What are the implications for how this interacts with fences and atomics?

For example, if there is a store to a !thread.private memory location 
before an atomic with release semantics, are we allowed to move the 
store to after the atomic?

It sounds like perhaps this would be allowed, and this could be useful 
for modeling non-coherent memory accesses in GLSL and SPIR-V.

Cheers,
Nicolai

-- 
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.

Philip Reames via llvm-dev

2018-Sep-17 17:58 UTC

head link

[llvm-dev] RFC: Adding a !thread.private metadata

On 09/16/2018 05:03 AM, Nicolai Hähnle via llvm-dev
wrote:> On 15.09.2018 01:13, Philip Reames via llvm-dev wrote:
>> Add a new metadata type which applies to memory accessing 
>> instructions (store, load, atomicrmw, etc...) and indicates that the 
>> memory location accessed is known to be accessed only by a single 
>> thread everywhere it is dereferenceable.
>
> What are the implications for how this interacts with fences and atomics?
Great question, hadn't thought about it honestly.>
> For example, if there is a store to a !thread.private memory location 
> before an atomic with release semantics, are we allowed to move the 
> store to after the atomic?I think this should be allowed.  Because it's known to only be read by a 
single thread, it *can't* be part of any synchronization between threads 
and thus should be freely reorderable.  The slightly tricky one is a 
single_thread fence.  I think that variant may need to prevent the 
reordering.  Having single_thread have *stronger* reordering semantics 
than the cross thread variant feels really ugly though.

The initial implementation would treat them as strongly ordered. (Just 
for simplicity).
>
> It sounds like perhaps this would be allowed, and this could be useful 
> for modeling non-coherent memory accesses in GLSL and SPIR-V.Can you spell out the semantics you're looking for?  If you have a form 
of memory access which is shared, just unordered, I suspect NotAtomic or 
Unordered are probably a better fit, but I'd need more information to 
really tell.>
> Cheers,
> Nicolai
>

Philip Reames via llvm-dev

2018-Sep-17 21:31 UTC

head link

[llvm-dev] RFC: Adding a !thread.private metadata

Posted review for this here: https://reviews.llvm.org/D52192

Philip

On 09/14/2018 04:13 PM, Philip Reames via llvm-dev
wrote:> Problem
>
> LLVM's memory model for NonAtomic accesses is generally fairly weak, 
> but explicitly disallows inserting stores that didn't occur in the 
> original program.  This is required for any potentially shared 
> location, but is overkill for any memory location which is provably 
> only accessed by a single thread.
>
> My particular motivating example is a single thread private field in 
> our implementation, but there are numerous languages which provide 
> thread private storage options and right now, LLVM has no good way to 
> represent them.
>
> (Just to set expectations appropriately: the example which made me 
> write this up is purely a "hey, that's interesting" case at
the
> moment.  It's not a major blocking item or anything.  As such, I'm 
> mostly throwing this out for discussion because it's interesting.)
>
> Proposed Solution
>
> Add a new metadata type which applies to memory accessing instructions 
> (store, load, atomicrmw, etc...) and indicates that the memory 
> location accessed is known to be accessed only by a single thread 
> everywhere it is dereferenceable.
>
> The framing is very similar to the one we use for !invariant.load and 
> for much the same reasons.  If we can prove a location is 
> dereferenceable, we want to be able to insert a store along any 
> dereferenceable path through the function without worrying whether the 
> original location was known to execute or not.  At the moment, the 
> main transform to leverage this would be load store promotion in LICM 
> which would be taught that inserting a loop exit is legal, even if the 
> store didn't execute within the dynamic execution of the loop, if the 
> metadata is present.
>
> Alternatives and Discussion
>
> LLVM IR has existing support for thread local storage, but this 
> doesn't solve our problem.  There's nothing that presents one
thread
> from capturing the address of it's thread local copy and publishing 
> that address in a location visible to other threads. Given a thread 
> local variable and a nocapture result, we can conclude the location is 
> thread private.  (Same for allocas, mallocs, etc...)
>
> As just noted, there are places where we can infer that an access is 
> thread private.  I think it makes sense to expose this as an analysis 
> utility or pass.  We have bits of this already existing in LICM which 
> could be pulled out, renamed, and reused.  There are various other 
> transforms we could implement for thread private locations (e.g. 
> replace an atomicrmw on a thread private with a load, op, store 
> sequence), but I'm not sure these are actually worth implementing at 
> the moment.
>
> We could extend the memory model with a weaker access type.  I think 
> our current NotAtomic is a good default, but we could consider adding 
> a ThreadPrivate specifier which is weaker than the existing NotAtomic 
> in exactly the same way that the metadata implies.  This is a 
> reasonable implementation strategy, but might be a bit more work than 
> I can practically commit to at the moment.
>
> Hal recently brought up the idea of a nosync function attribute. If I 
> understand the intended semantics properly, such functions aren't 
> guaranteed to access strictly thread private locations. They're simply 
> required not to synchronize; that is, they are allowed to access 
> shared variables in a racy manner.
>
> Philip
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

llvm dev - Sep 2018 - RFC: Adding a !thread.private metadata

[llvm-dev] RFC: Adding a !thread.private metadata

[llvm-dev] RFC: Adding a !thread.private metadata

[llvm-dev] RFC: Adding a !thread.private metadata

[llvm-dev] RFC: Adding a !thread.private metadata

[llvm-dev] RFC: Adding a !thread.private metadata

[llvm-dev] RFC: Adding a !thread.private metadata