thr3ads.net - llvm dev - [LLVMdev] Predicated Vector Operations [May 2013]

If this information is useful, please help other people find it:
Share via:

Nadav Rotem

2013-May-08 23:09 UTC

[LLVMdev] Predicated Vector Operations

On May 8, 2013, at 4:00 PM, Eric Christopher <echristo at gmail.com>
wrote:
> 
> Thinking that a masked store is conservatively a store of the full
> width of the store right?
It depends on the optimization. Consider this example:

masked_store(Val, Ptr , M)
X = masked_load(Ptr, M2)

If you assume that your store actually overwrites everything in that memory
location then you don't need to load that memory location again. You can
simply use the stored value. However, in our example X != Val.
> But Jim pointed out that anything merging loads would then need to
> merge the masks otherwise even if selection would work otherwise, any
> pass that merges loads would need to learn how to deal with masks. Not
> likely a deal killer since I don't think there are a lot of them, but
> it does explain why it's more work than having them pass through.
I actually think that masks disrupt *everything* from alias analysis to SROA. If
you ignore the mask bad thing will happen. You can't be conservative because
you never know which way is 'conservative'. Should you assume that the
mask is all zero or all one ?
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130508/53f7105c/attachment.html>

Eric Christopher

2013-May-08 23:23 UTC

head link

[LLVMdev] Predicated Vector Operations

On Wed, May 8, 2013 at 4:09 PM, Nadav Rotem <nrotem at apple.com>
wrote:>
> On May 8, 2013, at 4:00 PM, Eric Christopher <echristo at gmail.com>
wrote:
>
>
> Thinking that a masked store is conservatively a store of the full
> width of the store right?
>
>
> It depends on the optimization. Consider this example:
>
> masked_store(Val, Ptr , M)
> X = masked_load(Ptr, M2)
>
> If you assume that your store actually overwrites everything in that memory
> location then you don't need to load that memory location again. You
can
> simply use the stored value. However, in our example X != Val.
>
>
> But Jim pointed out that anything merging loads would then need to
> merge the masks otherwise even if selection would work otherwise, any
> pass that merges loads would need to learn how to deal with masks. Not
> likely a deal killer since I don't think there are a lot of them, but
> it does explain why it's more work than having them pass through.
>
>
> I actually think that masks disrupt *everything* from alias analysis to
> SROA. If you ignore the mask bad thing will happen. You can't be
> conservative because you never know which way is 'conservative'.
Should you
> assume that the mask is all zero or all one ?
I was thinking that they'd not be optimized, but you are correct that
it'll be invasive - examples are illuminating. :)

I'm less convinced it's everything, but it's definitely enough that
care should be taken before adding a mask field to load/stores. It
will involve quite a bit of work and the tradeoffs should be well
known; i.e. we should know that we're going to get something out of it
more so than we would if we added it as a "target independent
intrinsic" or some such. Maybe in the long run the ability to treat
vector converts+masked loads/stores as shuffles could be useful, I
don't know either.

-eric

Chandler Carruth

2013-May-09 01:16 UTC

head link

[LLVMdev] Predicated Vector Operations

On Thu, May 9, 2013 at 1:09 AM, Nadav Rotem <nrotem at apple.com> wrote:
> On May 8, 2013, at 4:00 PM, Eric Christopher <echristo at gmail.com>
wrote:
>
>
> Thinking that a masked store is conservatively a store of the full
> width of the store right?
>
>
> It depends on the optimization. Consider this example:
>
> masked_store(Val, Ptr , M)
> X = masked_load(Ptr, M2)
>
> If you assume that your store actually overwrites everything in that
> memory location then you don't need to load that memory location again.
You
> can simply use the stored value. However, in our example X != Val.
>
I'm not sure I understand the full impact of this example, and I would like
to.

What are the desired memory model semantics for a masked store?
Specifically, let me suppose a simplified vector model of <2 x i64> on an
i64-word-size platform.

  masked_store(<42, 42>, Ptr, <true, false>)

Does this write to the entier <2 x i64> object stored at Ptr or not? Put
another way, consider:


  thread A:
    ...
    masked_store(<42, 42>, Ptr, <true, false>)
    ...

  thread B:
    ...
    masked_store(<42, 42>, Ptr, <false, true>)
    ...


Assuming there is no specific synchronization relevant to Ptr between these
two threads and their masked stores, does this form a data race or not?
>From a memory model perspective, if this does *not* form a data race, thatmakes this tremendously more complex to implement, analyze, and optimize...
I'm somewhat hopeful that the desired semantics are for this to form a
datarace (and thus require synchronization when occurring in different
threads like this).
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130509/6dd59fed/attachment.html>

Nadav Rotem

2013-May-09 04:16 UTC

head link

[LLVMdev] Predicated Vector Operations

> I'm not sure I understand the full impact of this example, and I would
like to.
> 
> What are the desired memory model semantics for a masked store?
Specifically, let me suppose a simplified vector model of <2 x i64> on an
i64-word-size platform.
> 
Hi Chandler, 

I brought the example in this email thread to show that the optimizations that
we currently have won't work on masked load/store operations because they
don't take the mask into consideration. The memory model interesting
question but I am not sure how it is related. In our example you can see the
problem with a single thread. Both MIC and AVX[1] have masked stores operations
and they have a different memory model.

Thanks,
Nadav

[1] 
http://software.intel.com/sites/products/documentation/studio/composer/en-us/2011Update/compiler_c/intref_cls/common/intref_avx_maskstore_pd.htm

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130508/7637d894/attachment.html>

dag at cray.com

2013-May-09 14:47 UTC

head link

[LLVMdev] Predicated Vector Operations

Chandler Carruth <chandlerc at google.com> writes:
> What are the desired memory model semantics for a masked store?
> Specifically, let me suppose a simplified vector model of <2 x i64>
on
> an i64-word-size platform.
>
> masked_store(<42, 42>, Ptr, <true, false>)
>
> Does this write to the entier <2 x i64> object stored at Ptr or not?
No.  It writes one element.
> Put another way, consider:
>
> thread A:
> ...
> masked_store(<42, 42>, Ptr, <true, false>)
> ...
>
> thread B:
> ...
> masked_store(<42, 42>, Ptr, <false, true>)
> ...
>
> Assuming there is no specific synchronization relevant to Ptr between
> these two threads and their masked stores, does this form a data race
> or not?
It entirely depends on the hardware implementation.  In most cases I
would say yes due to cache conherence issues.  From a purely theoretical
machine that doesn't have false sharing, there would be no data race.

Of course this assumes that thread B won't access the element stored by
thread A and vice versa.
> From a memory model perspective, if this does *not* form a data race,
> that makes this tremendously more complex to implement, analyze, and
> optimize... I'm somewhat hopeful that the desired semantics are for
> this to form a datarace (and thus require synchronization when
> occurring in different threads like this).
Most of the time the compiler will not know the mask value and will have
to be conservative.  As Nadav has pointed out, what constitutes
"conservative" is entirely context-dependent.

But I don't understand why defining this as not being a data race would
complicate things.  I'm assuming the mask values are statically known.
Can you explain a bit more?

                                 -David

Maybe Matching Threads

Search for more possibly parallel threads

llvm dev - May 2013 - [LLVMdev] Predicated Vector Operations

[LLVMdev] Predicated Vector Operations

[LLVMdev] Predicated Vector Operations

[LLVMdev] Predicated Vector Operations

[LLVMdev] Predicated Vector Operations

[LLVMdev] Predicated Vector Operations

Maybe Matching Threads