thr3ads.net - llvm dev - [LLVMdev] Adding masked vector load and store intrinsics [Oct 2014]

If this information is useful, please help other people find it:
Share via:

dag at cray.com

2014-Oct-24 17:22 UTC

[LLVMdev] Adding masked vector load and store intrinsics

"Das, Dibyendu" <Dibyendu.Das at amd.com> writes:
> This looks to be a reasonable proposal. However native instructions
> that support such masked ld/st may have a high latency ? Also, it
> would be good to state some workloads where this will have a positive
> impact.
Any significant vector workload will see a giant gain from this.

The masked operations really shouldn't have any more latency.  The time
of the memory operation itself dominates.

                            -David

Das, Dibyendu

2014-Oct-24 18:44 UTC

head link

[LLVMdev] Adding masked vector load and store intrinsics

Is there an example of such a workload ( lets say from the spec cpu 2006 harness
or similar ) that you have in mind and the amount of gain expected ?
- dibyendu

-----Original Message-----
From: dag at cray.com [mailto:dag at cray.com] 
Sent: Friday, October 24, 2014 10:52 PM
To: Das, Dibyendu
Cc: 'elena.demikhovsky at intel.com'; 'llvmdev at cs.uiuc.edu'
Subject: Re: [LLVMdev] Adding masked vector load and store intrinsics

"Das, Dibyendu" <Dibyendu.Das at amd.com> writes:
> This looks to be a reasonable proposal. However native instructions 
> that support such masked ld/st may have a high latency ? Also, it 
> would be good to state some workloads where this will have a positive 
> impact.
Any significant vector workload will see a giant gain from this.

The masked operations really shouldn't have any more latency.  The time of
the memory operation itself dominates.

                            -David

dag at cray.com

2014-Oct-24 19:50 UTC

head link

[LLVMdev] Adding masked vector load and store intrinsics

"Das, Dibyendu" <Dibyendu.Das at amd.com> writes:
> Is there an example of such a workload ( lets say from the spec cpu
> 2006 harness or similar ) that you have in mind and the amount of gain
> expected ?
Literally nearly every code that has significant vector work in it.
Even if there is no control flow in the loop, masking allows the
compiler to more aggressively vectorize and rely on the masks to prevent
unsafe execution.

The amount of gain is highly code-dependent but my guess is that Elena's
example of 2x speedup is typical, maybe even on the lower end.

The capability of the vectorizer is the biggest factor.  Without masks,
the vectorizer cannot be as aggressive.  With masks, the vectorizer
still has to be written to be aggressive.  Ph.D. dissertations have been
written on the topic.  It's non-trivial work.

Masking is an enabling technology, not an end goal.

                         -David

llvm dev - Oct 2014 - [LLVMdev] Adding masked vector load and store intrinsics

[LLVMdev] Adding masked vector load and store intrinsics

[LLVMdev] Adding masked vector load and store intrinsics

[LLVMdev] Adding masked vector load and store intrinsics