thr3ads.net - llvm dev - [llvm-dev] RFC: New intrinsics masked.expandload and masked.compressstore [Sep 2016]

If this information is useful, please help other people find it:
Share via:

Demikhovsky, Elena via llvm-dev

2016-Sep-19 06:37 UTC

[llvm-dev] RFC: New intrinsics masked.expandload and masked.compressstore

Hi all,

       AVX-512 ISA introduces new vector instructions VCOMPRESS and VEXPAND in
order to allow vectorization of the following loops with two specific types of
cross-iteration dependencies:

       Compress:
       for (int i=0; i<N; ++i)
         If (t[i])
           *A++ = expr;

       Expand:
       for (i=0; i<N; ++i)
         If (t[i])
            X[i] = *A++;
         else
            X[i] = PassThruV[i];

       On this poster (
http://llvm.org/devmtg/2013-11/slides/Demikhovsky-Poster.pdf ) you'll find
depicted "compress" and "expand" patterns.

       The RFC proposes to support this functionality by introducing two
intrinsics to LLVM IR:
       llvm.masked.expandload.*
       llvm.masked.compressstore.*

       The syntax of these two intrinsics is similar to the syntax of
llvm.masked.load.* and masked.store.*, respectively, but the semantics are
different, matching the above patterns.

       %res = call <16 x float> @llvm.masked.expandload.v16f32.p0f32
(float* %ptr, <16 x i1>%mask, <16 x float> %passthru)
       void @llvm.masked.compressstore.v16f32.p0f32 (<16 x float>
<value>, float* <ptr>, <16 x i1> <mask>)

       The arguments - %mask, %value and %passthru all have the same vector
length.
       The underlying type of %ptr corresponds to the scalar type of the vector
value.
       (In brief; the full syntax description will be provided in subsequent
full documentation.)

       The intrinsics are planned to be target independent, similar to
masked.load/store/gather/scatter. They will be lowered effectively on AVX-512
and scalarized on other targets, also akin to masked.* intrinsics.
       Loop vectorizer will query TTI about existence of effective support for
these intrinsics, and if provided will be able to handle loops with such
cross-iteration dependences.

       The first step will include the full documentation and implementation of
CodeGen part.

       An additional information about expand load (
https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=expandload&techs=AVX_512
)  and compress store
(https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=compressstore&techs=AVX_512
) you also can find in the Intel Intrinsic Guide.

-       Elena



---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160919/999f82dc/attachment.html>

Hal Finkel via llvm-dev

2016-Sep-24 16:49 UTC

head link

[llvm-dev] RFC: New intrinsics masked.expandload and masked.compressstore

Hi Elena,

Technically speaking, this seems straightforward.

I wonder, however, how target-independent this is in a practical sense; will
there be an efficient lowering when targeting any other ISA? I don't want to
get into the territory where, because the vectorizer is supposed to be
architecture independent, we need to add target-independent intrinsics for all
potentially-side-effect-carrying idioms (or just complicated idioms) we want the
vectorizer to support on any target. Is there a way we can design the vectorizer
so that the targets can plug in their own idiom recognition for these kinds of
things, and then, via that interface, let the vectorizer produce the relevant
target-dependent intrinsics?

Thanks again,
Hal

----- Original Message -----> From: "Elena Demikhovsky" <elena.demikhovsky at intel.com>
> To: "llvm-dev" <llvm-dev at lists.llvm.org>
> Cc: "Ayal Zaks" <ayal.zaks at intel.com>, "Michael
Kuperstein" <mkuper at google.com>, "Adam Nemet (anemet at
apple.com)"
> <anemet at apple.com>, "Hal Finkel (hfinkel at anl.gov)"
<hfinkel at anl.gov>, "Sanjay Patel (spatel at rotateright.com)"
> <spatel at rotateright.com>, "Nadav Rotem" <nadav.rotem
at me.com>
> Sent: Monday, September 19, 2016 1:37:02 AM
> Subject: RFC: New intrinsics masked.expandload and masked.compressstore
> 
> 
> Hi all,
> 
> AVX-512 ISA introduces new vector instructions VCOMPRESS and VEXPAND
> in order to allow vectorization of the following loops with two
> specific types of cross-iteration dependencies:
> 
> Compress:
> for (int i=0; i<N; ++i)
> If (t[i])
> *A++ = expr;
> 
> Expand:
> for (i=0; i<N; ++i)
> If (t[i])
> X[i] = *A++;
> else
> X[i] = PassThruV[i];
> 
> On this poster (
> http://llvm.org/devmtg/2013-11/slides/Demikhovsky-Poster.pdf )
> you’ll find depicted “compress” and “expand” patterns.
> 
> The RFC proposes to support this functionality by introducing two
> intrinsics to LLVM IR:
> llvm.masked.expandload.*
> llvm.masked.compressstore.*
> 
> The syntax of these two intrinsics is similar to the syntax of
> llvm.masked.load.* and masked.store.*, respectively, but the
> semantics are different, matching the above patterns.
> 
> %res = call <16 x float> @llvm.masked.expandload.v16f32.p0f32 (float*
> %ptr, <16 x i1>%mask, <16 x float> %passthru)
> void @llvm.masked.compressstore.v16f32.p0f32 (<16 x float>
<value>,
> float* <ptr>, <16 x i1> <mask>)
> 
> The arguments - %mask, %value and %passthru all have the same vector
> length.
> The underlying type of %ptr corresponds to the scalar type of the
> vector value.
> (In brief; the full syntax description will be provided in subsequent
> full documentation.)
> 
> The intrinsics are planned to be target independent, similar to
> masked.load/store/gather/scatter. They will be lowered effectively
> on AVX-512 and scalarized on other targets, also akin to masked.*
> intrinsics.
> Loop vectorizer will query TTI about existence of effective support
> for these intrinsics, and if provided will be able to handle loops
> with such cross-iteration dependences.
> 
> The first step will include the full documentation and implementation
> of CodeGen part.
> 
> An additional information about expand load (
>
https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=expandload&techs=AVX_512
> ) and compress store (
>
https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=compressstore&techs=AVX_512
> ) you also can find in the Intel Intrinsic Guide.
> 
> 
>     * Elena
> 
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
> 
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

Demikhovsky, Elena via llvm-dev

2016-Sep-25 18:28 UTC

head link

[llvm-dev] RFC: New intrinsics masked.expandload and masked.compressstore

|
  |Hi Elena,
  |
  |Technically speaking, this seems straightforward.
  |
  |I wonder, however, how target-independent this is in a practical
  |sense; will there be an efficient lowering when targeting any other
  |ISA? I don't want to get into the territory where, because the
  |vectorizer is supposed to be architecture independent, we need to
  |add target-independent intrinsics for all potentially-side-effect-
  |carrying idioms (or just complicated idioms) we want the vectorizer to
  |support on any target. Is there a way we can design the vectorizer so
  |that the targets can plug in their own idiom recognition for these
  |kinds of things, and then, via that interface, let the vectorizer produce
  |the relevant target-dependent intrinsics?

Entering target specific plug-in in vectorizer may be a good idea. We need
target specific pattern recognition and target specific implementation of
“vectorizeMemoryInstruction”. (It may be more functionality in the future)
TTI->checkAdditionalVectorizationOppotunities() - detects target specific
patterns; X86 will find compress/expand and may be others
TTI->vectorizeMemoryInstruction()  - handle only exotic target-specific cases

Pros:
It will allow us to implement all X86 specific solutions.
The expandload and compresssrore intrinsics may be x86 specific, polymorphic:
llvm.x86.masked.expandload()
llvm.x86.masked.compressstore()

Cons:

TTI will need to deal with Loop Info, SCEVs and other loop analysis info that it
does not have today. (I do not like this way)
Or we'll need to introduce TLV - Target Loop Vectorizer - a new class that
handles all target specific cases. This solution seems more reasonable, but too
heavy just for compress/expand.
Do you see any other target plug-in solution? 

-Elena

  |
  |Thanks again,
  |Hal
  |
  |----- Original Message -----
  |> From: "Elena Demikhovsky" <elena.demikhovsky at
intel.com>
  |> To: "llvm-dev" <llvm-dev at lists.llvm.org>
  |> Cc: "Ayal Zaks" <ayal.zaks at intel.com>, "Michael
Kuperstein"
  |<mkuper at google.com>, "Adam Nemet (anemet at apple.com)"
  |> <anemet at apple.com>, "Hal Finkel (hfinkel at anl.gov)"
  |<hfinkel at anl.gov>, "Sanjay Patel (spatel at
rotateright.com)"
  |> <spatel at rotateright.com>, "Nadav Rotem"
  |<nadav.rotem at me.com>
  |> Sent: Monday, September 19, 2016 1:37:02 AM
  |> Subject: RFC: New intrinsics masked.expandload and
  |> masked.compressstore
  |>
  |>
  |> Hi all,
  |>
  |> AVX-512 ISA introduces new vector instructions VCOMPRESS and
  |VEXPAND
  |> in order to allow vectorization of the following loops with two
  |> specific types of cross-iteration dependencies:
  |>
  |> Compress:
  |> for (int i=0; i<N; ++i)
  |> If (t[i])
  |> *A++ = expr;
  |>
  |> Expand:
  |> for (i=0; i<N; ++i)
  |> If (t[i])
  |> X[i] = *A++;
  |> else
  |> X[i] = PassThruV[i];
  |>
  |> On this poster (
  |> http://llvm.org/devmtg/2013-11/slides/Demikhovsky-Poster.pdf )
  |you’ll
  |> find depicted “compress” and “expand” patterns.
  |>
  |> The RFC proposes to support this functionality by introducing two
  |> intrinsics to LLVM IR:
  |> llvm.masked.expandload.*
  |> llvm.masked.compressstore.*
  |>
  |> The syntax of these two intrinsics is similar to the syntax of
  |> llvm.masked.load.* and masked.store.*, respectively, but the
  |semantics
  |> are different, matching the above patterns.
  |>
  |> %res = call <16 x float> @llvm.masked.expandload.v16f32.p0f32
  |(float*
  |> %ptr, <16 x i1>%mask, <16 x float> %passthru) void
  |> @llvm.masked.compressstore.v16f32.p0f32 (<16 x float>
<value>,
  |> float* <ptr>, <16 x i1> <mask>)
  |>
  |> The arguments - %mask, %value and %passthru all have the same
  |vector
  |> length.
  |> The underlying type of %ptr corresponds to the scalar type of the
  |> vector value.
  |> (In brief; the full syntax description will be provided in subsequent
  |> full documentation.)
  |>
  |> The intrinsics are planned to be target independent, similar to
  |> masked.load/store/gather/scatter. They will be lowered effectively
  |on
  |> AVX-512 and scalarized on other targets, also akin to masked.*
  |> intrinsics.
  |> Loop vectorizer will query TTI about existence of effective support
  |> for these intrinsics, and if provided will be able to handle loops
  |> with such cross-iteration dependences.
  |>
  |> The first step will include the full documentation and
  |implementation
  |> of CodeGen part.
  |>
  |> An additional information about expand load (
  |>
  |https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text  |exp
  |> andload&techs=AVX_512
  |> ) and compress store (
  |>
  |https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text  |com
  |> pressstore&techs=AVX_512
  |> ) you also can find in the Intel Intrinsic Guide.
  |>
  |>
  |>     * Elena
  |>
  |> ---------------------------------------------------------------------
  |> Intel Israel (74) Limited
  |>
  |> This e-mail and any attachments may contain confidential material
  |for
  |> the sole use of the intended recipient(s). Any review or distribution
  |> by others is strictly prohibited. If you are not the intended
  |> recipient, please contact the sender and delete all copies.
  |
  |--
  |Hal Finkel
  |Lead, Compiler Technology and Programming Languages Leadership
  |Computing Facility Argonne National Laboratory
---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

Possibly Parallel Threads

Search for more seemingly similar threads

llvm dev - Sep 2016 - RFC: New intrinsics masked.expandload and masked.compressstore

[llvm-dev] RFC: New intrinsics masked.expandload and masked.compressstore

[llvm-dev] RFC: New intrinsics masked.expandload and masked.compressstore

[llvm-dev] RFC: New intrinsics masked.expandload and masked.compressstore

Possibly Parallel Threads