On May 8, 2013, at 4:00 PM, Eric Christopher <echristo at gmail.com> wrote:> > Thinking that a masked store is conservatively a store of the full > width of the store right?It depends on the optimization. Consider this example: masked_store(Val, Ptr , M) X = masked_load(Ptr, M2) If you assume that your store actually overwrites everything in that memory location then you don't need to load that memory location again. You can simply use the stored value. However, in our example X != Val.> But Jim pointed out that anything merging loads would then need to > merge the masks otherwise even if selection would work otherwise, any > pass that merges loads would need to learn how to deal with masks. Not > likely a deal killer since I don't think there are a lot of them, but > it does explain why it's more work than having them pass through.I actually think that masks disrupt *everything* from alias analysis to SROA. If you ignore the mask bad thing will happen. You can't be conservative because you never know which way is 'conservative'. Should you assume that the mask is all zero or all one ? -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130508/53f7105c/attachment.html>
On Wed, May 8, 2013 at 4:09 PM, Nadav Rotem <nrotem at apple.com> wrote:> > On May 8, 2013, at 4:00 PM, Eric Christopher <echristo at gmail.com> wrote: > > > Thinking that a masked store is conservatively a store of the full > width of the store right? > > > It depends on the optimization. Consider this example: > > masked_store(Val, Ptr , M) > X = masked_load(Ptr, M2) > > If you assume that your store actually overwrites everything in that memory > location then you don't need to load that memory location again. You can > simply use the stored value. However, in our example X != Val. > > > But Jim pointed out that anything merging loads would then need to > merge the masks otherwise even if selection would work otherwise, any > pass that merges loads would need to learn how to deal with masks. Not > likely a deal killer since I don't think there are a lot of them, but > it does explain why it's more work than having them pass through. > > > I actually think that masks disrupt *everything* from alias analysis to > SROA. If you ignore the mask bad thing will happen. You can't be > conservative because you never know which way is 'conservative'. Should you > assume that the mask is all zero or all one ?I was thinking that they'd not be optimized, but you are correct that it'll be invasive - examples are illuminating. :) I'm less convinced it's everything, but it's definitely enough that care should be taken before adding a mask field to load/stores. It will involve quite a bit of work and the tradeoffs should be well known; i.e. we should know that we're going to get something out of it more so than we would if we added it as a "target independent intrinsic" or some such. Maybe in the long run the ability to treat vector converts+masked loads/stores as shuffles could be useful, I don't know either. -eric
On Thu, May 9, 2013 at 1:09 AM, Nadav Rotem <nrotem at apple.com> wrote:> On May 8, 2013, at 4:00 PM, Eric Christopher <echristo at gmail.com> wrote: > > > Thinking that a masked store is conservatively a store of the full > width of the store right? > > > It depends on the optimization. Consider this example: > > masked_store(Val, Ptr , M) > X = masked_load(Ptr, M2) > > If you assume that your store actually overwrites everything in that > memory location then you don't need to load that memory location again. You > can simply use the stored value. However, in our example X != Val. >I'm not sure I understand the full impact of this example, and I would like to. What are the desired memory model semantics for a masked store? Specifically, let me suppose a simplified vector model of <2 x i64> on an i64-word-size platform. masked_store(<42, 42>, Ptr, <true, false>) Does this write to the entier <2 x i64> object stored at Ptr or not? Put another way, consider: thread A: ... masked_store(<42, 42>, Ptr, <true, false>) ... thread B: ... masked_store(<42, 42>, Ptr, <false, true>) ... Assuming there is no specific synchronization relevant to Ptr between these two threads and their masked stores, does this form a data race or not?>From a memory model perspective, if this does *not* form a data race, thatmakes this tremendously more complex to implement, analyze, and optimize... I'm somewhat hopeful that the desired semantics are for this to form a datarace (and thus require synchronization when occurring in different threads like this). -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130509/6dd59fed/attachment.html>
> I'm not sure I understand the full impact of this example, and I would like to. > > What are the desired memory model semantics for a masked store? Specifically, let me suppose a simplified vector model of <2 x i64> on an i64-word-size platform. >Hi Chandler, I brought the example in this email thread to show that the optimizations that we currently have won't work on masked load/store operations because they don't take the mask into consideration. The memory model interesting question but I am not sure how it is related. In our example you can see the problem with a single thread. Both MIC and AVX[1] have masked stores operations and they have a different memory model. Thanks, Nadav [1] http://software.intel.com/sites/products/documentation/studio/composer/en-us/2011Update/compiler_c/intref_cls/common/intref_avx_maskstore_pd.htm -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130508/7637d894/attachment.html>
Chandler Carruth <chandlerc at google.com> writes:> What are the desired memory model semantics for a masked store? > Specifically, let me suppose a simplified vector model of <2 x i64> on > an i64-word-size platform. > > masked_store(<42, 42>, Ptr, <true, false>) > > Does this write to the entier <2 x i64> object stored at Ptr or not?No. It writes one element.> Put another way, consider: > > thread A: > ... > masked_store(<42, 42>, Ptr, <true, false>) > ... > > thread B: > ... > masked_store(<42, 42>, Ptr, <false, true>) > ... > > Assuming there is no specific synchronization relevant to Ptr between > these two threads and their masked stores, does this form a data race > or not?It entirely depends on the hardware implementation. In most cases I would say yes due to cache conherence issues. From a purely theoretical machine that doesn't have false sharing, there would be no data race. Of course this assumes that thread B won't access the element stored by thread A and vice versa.> From a memory model perspective, if this does *not* form a data race, > that makes this tremendously more complex to implement, analyze, and > optimize... I'm somewhat hopeful that the desired semantics are for > this to form a datarace (and thus require synchronization when > occurring in different threads like this).Most of the time the compiler will not know the mask value and will have to be conservative. As Nadav has pointed out, what constitutes "conservative" is entirely context-dependent. But I don't understand why defining this as not being a data race would complicate things. I'm assuming the mask values are statically known. Can you explain a bit more? -David