thr3ads.net - llvm dev - [LLVMdev] [RFC][PATCH][OPENCL] synchronization scopes redux [Jan 2015]

If this information is useful, please help other people find it:
Share via:

Sahasrabuddhe, Sameer

2015-Jan-07 04:06 UTC

[LLVMdev] [RFC][PATCH][OPENCL] synchronization scopes redux

On 1/7/2015 8:59 AM, Chandler Carruth wrote:>
> Essentially, I think target-independent optimizations are still 
> attractive, but we might want to just force them to go through an 
> actual target-implemented API to interpret the scopes rather than 
> making the interpretation work from first principles. I just worry 
> that the targets are going to be too different and we may fail to 
> accurately predict future targets' needs.
If we have a target-implemented API, then just opaque numbers should 
also be sufficient, right? For the API, all we care about is queries 
that interesting optimizations will want answered from the target. This 
could be at the instruction level: "is it okay to remove this atomic 
store with scope n1 that is immediately followed by atomic store with 
scope n2?". Or it could be at the scope level: "does scope n2 include 
scope n1"?
> I think the "strings" can be made relatively clean.
>
> What I'm imagining is something very much like the target-specific 
> attributes which are just strings and left to the target to interpret, 
> but are cleanly factored so that the strings are wrapped up in a nice 
> opaque attribute that is used as the sigil everywhere in the IR. We 
> could do this with metadata, and technically this fits the model of 
> metadata if we make the interpretation of the absence of metadata be 
> "system". However, I'm quite hesitant to rely on metadata
here as it
> hasn't always ended up working so well for us. ;]
Metadata was the first thing to be considered internally at AMD. But it 
was quickly shot down because the Research guys were unwilling to accept 
the possibility of scope being lost and replaced by a default "system"
scope. Current models are useful only when all atomic accesses for a 
given location use the same scope throughout the application, i.e., all 
threads running on all agents. So it is not okay for the compiler to 
"promote" the scope in just one kernel unless it has access to the 
entire application; the result is undefined. This is true for OpenCL 
source as well as HSAIL target. This may change in the near furture:

HRF-Relaxed: Adapting HRF to the complexities of industrial 
heterogeneous memory models
http://benedictgaster.org/?page_id=278

But even then, it will be difficult to say if the same models can be 
applied to heterogeneous systems that don't resemble OpenCL or HSAIL.
> I'd be interested in your thoughts and others' thoughts on how me 
> might encode an opaque string-based scope effectively. If we can find 
> a reasonably clean way of doing it, it seems like the best approach at 
> this point:
>
> - It ensures we have no bitcode stability problems.
> - It makes it easy to define a small number of IR-specified values 
> like system/crossthread/allthreads/whatever and singlethread, and 
> doing so isn't ever awkward due to any kind of baked-in ordering.
> - In practice in the real world, every target is probably going to 
> just take this and map it to an enum that clearly spells out the rank 
> for their target, so I suspect it won't actually increase the 
> complexity of things much.
I seem to be missing something here about the need for strings. If they 
are opaque anyway, and they are represented by sigils, then the sigils 
themselves are all that matter, right? Then the encoding is just a number...
>     But while the topic is wide open, here's another possibly whacky
>     approach: we let the scopes be integers, and add a "scope
layout"
>     string similar to data-layout. The string encodes the ordering of
>     the integers. If it is empty, then simple numerical comparisons
>     are sufficient. Else the string spells out the exact ordering to
>     be used. Any known current target will be happy with the first
>     option. If some target inserts an intermediate scope in the
>     future, then that version switches from empty to a fully specified
>     string. The best part is that we don't even need to do this right
>     now, and only come up with a "scope layout" spec when we
really
>     hit the problem for some future target.
>
>
> This isn't a bad approach, but it seems even more complex. I think
I'd
> rather go with the fairly boring one where the IR just encodes enough 
> data for the target to answer queries about the relationship between 
> scopes.
I am not really championing scope layout strings over a 
target-implemented API, but it seems less work to me rather than more. 
The relationship between scopes is just an SWO, and it can be 
represented as a graph. A practical target will have a very small number 
of scopes, say not more than 16. It should be possible to encode this 
into a graphviz-style string. Then instead of having every target 
implement an API, they just have to specify the relationship as a string.

Sameer.

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150107/41d2e27f/attachment.html>

Chandler Carruth

2015-Jan-07 04:12 UTC

head link

[LLVMdev] [RFC][PATCH][OPENCL] synchronization scopes redux

On Tue, Jan 6, 2015 at 8:06 PM, Sahasrabuddhe, Sameer <
sameer.sahasrabuddhe at amd.com> wrote:
>
> On 1/7/2015 8:59 AM, Chandler Carruth wrote:
>
>
> Essentially, I think target-independent optimizations are still
> attractive, but we might want to just force them to go through an actual
> target-implemented API to interpret the scopes rather than making the
> interpretation work from first principles. I just worry that the targets
> are going to be too different and we may fail to accurately predict future
> targets' needs.
>
>
> If we have a target-implemented API, then just opaque numbers should also
> be sufficient, right? For the API, all we care about is queries that
> interesting optimizations will want answered from the target. This could be
> at the instruction level: "is it okay to remove this atomic store with
> scope n1 that is immediately followed by atomic store with scope n2?".
Or
> it could be at the scope level: "does scope n2 include scope n1"?
>
I think it is significantly more friendly (and easier to debug mistakes) if
the textual IR uses human readable names. We already have a hard time due
to the totally opaque nature of address spaces -- there are magical address
spaces for segment stuff in x86.

The strings are only opaque to the target-independent optimizer. While
integers and strings are equally friendly to the code in the target,
strings are significantly more friendly to humans reading the IR.


The other advantage is that it makes it much harder to accidentally write
code that relies on the particular values for the integers. =]

>
>
>   I think the "strings" can be made relatively clean.
>
>  What I'm imagining is something very much like the target-specific
> attributes which are just strings and left to the target to interpret, but
> are cleanly factored so that the strings are wrapped up in a nice opaque
> attribute that is used as the sigil everywhere in the IR. We could do this
> with metadata, and technically this fits the model of metadata if we make
> the interpretation of the absence of metadata be "system".
However, I'm
> quite hesitant to rely on metadata here as it hasn't always ended up
> working so well for us. ;]
>
>
> Metadata was the first thing to be considered internally at AMD. But it
> was quickly shot down because the Research guys were unwilling to accept
> the possibility of scope being lost and replaced by a default
"system"
> scope. Current models are useful only when all atomic accesses for a given
> location use the same scope throughout the application, i.e., all threads
> running on all agents. So it is not okay for the compiler to
"promote" the
> scope in just one kernel unless it has access to the entire application;
> the result is undefined. This is true for OpenCL source as well as HSAIL
> target. This may change in the near furture:
>
> HRF-Relaxed: Adapting HRF to the complexities of industrial heterogeneous
> memory models
> http://benedictgaster.org/?page_id=278
>
> But even then, it will be difficult to say if the same models can be
> applied to heterogeneous systems that don't resemble OpenCL or HSAIL.
>
Yea, I'm not really surprised by this.

>
>   I'd be interested in your thoughts and others' thoughts on how me
might
> encode an opaque string-based scope effectively. If we can find a
> reasonably clean way of doing it, it seems like the best approach at this
> point:
>
>  - It ensures we have no bitcode stability problems.
> - It makes it easy to define a small number of IR-specified values like
> system/crossthread/allthreads/whatever and singlethread, and doing so
isn't
> ever awkward due to any kind of baked-in ordering.
> - In practice in the real world, every target is probably going to just
> take this and map it to an enum that clearly spells out the rank for their
> target, so I suspect it won't actually increase the complexity of
things
> much.
>
>
> I seem to be missing something here about the need for strings. If they
> are opaque anyway, and they are represented by sigils, then the sigils
> themselves are all that matter, right? Then the encoding is just a
number...
>
See above for why I'd prefer not to use a raw number in the IR.

>
>
>
>
>> But while the topic is wide open, here's another possibly whacky
>> approach: we let the scopes be integers, and add a "scope
layout" string
>> similar to data-layout. The string encodes the ordering of the
integers. If
>> it is empty, then simple numerical comparisons are sufficient. Else the
>> string spells out the exact ordering to be used. Any known current
target
>> will be happy with the first option. If some target inserts an
intermediate
>> scope in the future, then that version switches from empty to a fully
>> specified string. The best part is that we don't even need to do
this right
>> now, and only come up with a "scope layout" spec when we
really hit the
>> problem for some future target.
>
>
> This isn't a bad approach, but it seems even more complex. I think
I'd
> rather go with the fairly boring one where the IR just encodes enough data
> for the target to answer queries about the relationship between scopes.
>
>
> I am not really championing scope layout strings over a target-implemented
> API, but it seems less work to me rather than more. The relationship
> between scopes is just an SWO, and it can be represented as a graph. A
> practical target will have a very small number of scopes, say not more than
> 16. It should be possible to encode this into a graphviz-style string. Then
> instead of having every target implement an API, they just have to specify
> the relationship as a string.
>
I see where you're going here, and it sounds feasible, but it honestly
seems much *more* work and certainly more complex for the IR. We can always
add such a representation to communicate the relationships if it becomes
important, but I'd rather communicate via a boring target API to start with
I think.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150106/611b586c/attachment.html>

Mehdi Amini

2015-Jan-07 04:17 UTC

head link

[LLVMdev] [RFC][PATCH][OPENCL] synchronization scopes redux

> On Jan 6, 2015, at 8:06 PM, Sahasrabuddhe, Sameer <sameer.sahasrabuddhe
at amd.com> wrote:
> 
> 
> On 1/7/2015 8:59 AM, Chandler Carruth wrote:
>> 
>> Essentially, I think target-independent optimizations are still
attractive, but we might want to just force them to go through an actual
target-implemented API to interpret the scopes rather than making the
interpretation work from first principles. I just worry that the targets are
going to be too different and we may fail to accurately predict future
targets' needs.
> 
> If we have a target-implemented API, then just opaque numbers should also
be sufficient, right? For the API, all we care about is queries that interesting
optimizations will want answered from the target. This could be at the
instruction level: "is it okay to remove this atomic store with scope n1
that is immediately followed by atomic store with scope n2?". Or it could
be at the scope level: "does scope n2 include scope n1"?
> 
>> I think the "strings" can be made relatively clean.
>> 
>> What I'm imagining is something very much like the target-specific
attributes which are just strings and left to the target to interpret, but are
cleanly factored so that the strings are wrapped up in a nice opaque attribute
that is used as the sigil everywhere in the IR. We could do this with metadata,
and technically this fits the model of metadata if we make the interpretation of
the absence of metadata be "system". However, I'm quite hesitant
to rely on metadata here as it hasn't always ended up working so well for
us. ;]
> 
> Metadata was the first thing to be considered internally at AMD. But it was
quickly shot down because the Research guys were unwilling to accept the
possibility of scope being lost and replaced by a default "system"
scope. Current models are useful only when all atomic accesses for a given
location use the same scope throughout the application, i.e., all threads
running on all agents. So it is not okay for the compiler to "promote"
the scope in just one kernel unless it has access to the entire application; the
result is undefined. This is true for OpenCL source as well as HSAIL target.
This may change in the near furture:
> 
> HRF-Relaxed: Adapting HRF to the complexities of industrial heterogeneous
memory models
> http://benedictgaster.org/?page_id=278
<http://benedictgaster.org/?page_id=278>
> 
> But even then, it will be difficult to say if the same models can be
applied to heterogeneous systems that don't resemble OpenCL or HSAIL.
> 
>> I'd be interested in your thoughts and others' thoughts on how
me might encode an opaque string-based scope effectively. If we can find a
reasonably clean way of doing it, it seems like the best approach at this point:
>> 
>> - It ensures we have no bitcode stability problems.
>> - It makes it easy to define a small number of IR-specified values like
system/crossthread/allthreads/whatever and singlethread, and doing so isn't
ever awkward due to any kind of baked-in ordering.
>> - In practice in the real world, every target is probably going to just
take this and map it to an enum that clearly spells out the rank for their
target, so I suspect it won't actually increase the complexity of things
much.
> 
> I seem to be missing something here about the need for strings. If they are
opaque anyway, and they are represented by sigils, then the sigils themselves
are all that matter, right? Then the encoding is just a number…
Don’t the strings answer your previous concern: 
> . But now I see another potential problem with future bitcode if we require
an ordering on the scopes. What happens when a backend later introduces a new
scope that goes into the middle of the order?
Note: the backend can just convert the string into integer once. The string are
really useful only for serialization IIUC.


> 
>>  
>> But while the topic is wide open, here's another possibly whacky
approach: we let the scopes be integers, and add a "scope layout"
string similar to data-layout. The string encodes the ordering of the integers.
If it is empty, then simple numerical comparisons are sufficient. Else the
string spells out the exact ordering to be used. Any known current target will
be happy with the first option. If some target inserts an intermediate scope in
the future, then that version switches from empty to a fully specified string.
The best part is that we don't even need to do this right now, and only come
up with a "scope layout" spec when we really hit the problem for some
future target.
>> 
>> This isn't a bad approach, but it seems even more complex. I think
I'd rather go with the fairly boring one where the IR just encodes enough
data for the target to answer queries about the relationship between scopes.
> 
> I am not really championing scope layout strings over a target-implemented
API, but it seems less work to me rather than more. The relationship between
scopes is just an SWO, and it can be represented as a graph. A practical target
will have a very small number of scopes, say not more than 16. It should be
possible to encode this into a graphviz-style string. Then instead of having
every target implement an API, they just have to specify the relationship as a
string.
So basically you are replacing an API by a custom language in a string. Isn’t
such a string carrying an API by itself?

— 
Mehdi


> 
> Sameer.
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150106/645e6f2e/attachment.html>

Sahasrabuddhe, Sameer

2015-Jan-08 04:03 UTC

head link

[LLVMdev] [RFC][PATCH][OPENCL] synchronization scopes redux

On 1/7/2015 9:42 AM, Chandler Carruth wrote:> I think it is significantly more friendly (and easier to debug 
> mistakes) if the textual IR uses human readable names. We already have 
> a hard time due to the totally opaque nature of address spaces -- 
> there are magical address spaces for segment stuff in x86.
>
> The strings are only opaque to the target-independent optimizer. While 
> integers and strings are equally friendly to the code in the target, 
> strings are significantly more friendly to humans reading the IR.
>
> The other advantage is that it makes it much harder to accidentally 
> write code that relies on the particular values for the integers. =]
Here's what this looks like to me:

 1. LLVM text format will use string symbols for memory scopes, and not
    numbers. The set of strings is target defined, but "singlethread"
    and "system" are reserved and have a well-known meaning.

 2. "The keyword informally known as system" represents the set of all
    threads that could possibly synchronize on the location being
    accessed by the current atomic instruction. These threads could be
    local, remote, executing on different agents, or whatever else is
    admissible on that particular platform. We still need to agree on
    the keyword to be used.

 3. The bitcode will store memory scopes as unsigned integers, since
    that is the easiest way to maintain compatibility. The values 0 and
    1 are special. All other values are meaningful only within that bc
    file. The file will also provide a map from unsigned integers to
    string symbols which should be used to interpret all the
    non-standard integers.
     1. The map must not include 0 and 1, since the reader will
        internally map them to singlethread" and "system"
respectively.
     2. If the map is empty or non-existent, then all non-zero values
        will be mapped to "system", which is the current behaviour.

 4. The in-memory structure for an atomic instruction will represent
    memory scope as a reference to a uniqued strings. This eliminates
    any notion of performing arithmetic on the scope indicator, or to
    write code that is sensitive to its numerical value.

 5. Behaviour is undefined if a symbolic scope used in the IR is not
    supported by the target. This is true for "singlethread" and
    "system" also, since some targets may not have those scopes.

Is this correct? But how does this work in the SelectionDAG? Also, what 
will this look like in TableGen files?

Sameer.

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150108/9751708a/attachment.html>

Maybe Matching Threads

Search for more reasonably related threads

llvm dev - Jan 2015 - [LLVMdev] [RFC][PATCH][OPENCL] synchronization scopes redux

[LLVMdev] [RFC][PATCH][OPENCL] synchronization scopes redux

[LLVMdev] [RFC][PATCH][OPENCL] synchronization scopes redux

[LLVMdev] [RFC][PATCH][OPENCL] synchronization scopes redux

[LLVMdev] [RFC][PATCH][OPENCL] synchronization scopes redux

Maybe Matching Threads