Displaying 7 results from an estimated 7 matches for "vhaddps".
2020 Aug 19
2
Question about llvm vectors
...nder why some advanced vector operations are
specific to some CPU targets?
Let me take an example:
/// Horizontally adds the adjacent pairs of values contained in two
/// 128-bit vectors of [4 x float].
///
/// \headerfile <x86intrin.h>
///
/// This intrinsic corresponds to the <c> VHADDPS </c> instruction.
///
/// \param __a
/// A 128-bit vector of [4 x float] containing one of the source
operands.
/// The horizontal sums of the values are stored in the lower bits of the
/// destination.
/// \param __b
/// A 128-bit vector of [4 x float] containing one of the sourc...
2018 Mar 01
9
[RFC] llvm-mca: a static performance analysis tool
...- - -
Resource pressure by instruction:
[0] [1] [2] [3] [4] [5] [6] [7] [8] [9]
Instructions:
- - - - - 1.00 - - - -
vmulps %xmm0, %xmm1, %xmm2
- - - - 1.00 - - - - -
vhaddps %xmm2, %xmm2, %xmm3
- - - - 1.00 - - - - -
vhaddps %xmm3, %xmm3, %xmm4
Instruction Info:
[1]: #uOps
[2]: Latency
[3]: RThroughput
[4]: MayLoad
[5]: MayStore
[6]: HasSideEffects
[1] [2] [3] [4] [5] [6] Instructions:
1 2...
2018 Mar 02
0
[RFC] llvm-mca: a static performance analysis tool
...instruction:
> [0] [1] [2] [3] [4] [5] [6] [7] [8] [9]
> Instructions:
> - - - - - 1.00 - - - -
> vmulps %xmm0, %xmm1, %xmm2
> - - - - 1.00 - - - - -
> vhaddps %xmm2, %xmm2, %xmm3
> - - - - 1.00 - - - - -
> vhaddps %xmm3, %xmm3, %xmm4
>
>
> Instruction Info:
> [1]: #uOps
> [2]: Latency
> [3]: RThroughput
> [4]: MayLoad
> [5]: MayStore
> [6]: HasSideEffects
>
> [1] ...
2018 Mar 02
0
[RFC] llvm-mca: a static performance analysis tool
...e pressure by instruction:
> [0] [1] [2] [3] [4] [5] [6] [7] [8] [9]
> Instructions:
> - - - - - 1.00 - - - -
> vmulps %xmm0, %xmm1, %xmm2
> - - - - 1.00 - - - - -
> vhaddps %xmm2, %xmm2, %xmm3
> - - - - 1.00 - - - - -
> vhaddps %xmm3, %xmm3, %xmm4
>
>
> Instruction Info:
> [1]: #uOps
> [2]: Latency
> [3]: RThroughput
> [4]: MayLoad
> [5]: MayStore
> [6]: HasSideEffects
>
> [1] [...
2020 Aug 20
2
Question about llvm vectors
...t; horizontal add should sum all the elements to a single scalar value. With
> different implementation choices like that its hard to say it should be a
> generic operation when the behavior might only make sense for one target's
> instruction set.
>
> The behavior of the 256-bit vhaddps instruction on X86 is also weird since
> it treats the upper and lower 128-bits of the sources and destination
> independently. That quirk wouldn't make sense in a generic operation.
>
> You can emulate __builtin_ia32_haddps generically using
> __builtin_shufflevector and the + o...
2018 Mar 02
0
[RFC] llvm-mca: a static performance analysis tool
...Below is the
> timeline view for the dot-product example from the previous section.
>
> ///////////////
> Timeline view:
> 012345
> Index 0123456789
>
> [0,0] DeeER. . . vmulps %xmm0, %xmm1, %xmm2
> [0,1] D==eeeER . . vhaddps %xmm2, %xmm2, %xmm3
> [0,2] .D====eeeER . vhaddps %xmm3, %xmm3, %xmm4
>
> [1,0] .DeeE-----R . vmulps %xmm0, %xmm1, %xmm2
> [1,1] . D=eeeE---R . vhaddps %xmm2, %xmm2, %xmm3
> [1,2] . D====eeeER . vhaddps %xmm3, %xmm3, %xmm4
>
>...
2018 Mar 02
5
[RFC] llvm-mca: a static performance analysis tool
...quot;. Below is the
> timeline view for the dot-product example from the previous section.
>
> ///////////////
> Timeline view:
> 012345
> Index 0123456789
>
> [0,0] DeeER. . . vmulps %xmm0, %xmm1, %xmm2
> [0,1] D==eeeER . . vhaddps %xmm2, %xmm2, %xmm3
> [0,2] .D====eeeER . vhaddps %xmm3, %xmm3, %xmm4
>
> [1,0] .DeeE-----R . vmulps %xmm0, %xmm1, %xmm2
> [1,1] . D=eeeE---R . vhaddps %xmm2, %xmm2, %xmm3
> [1,2] . D====eeeER . vhaddps %xmm3, %xmm3, %xmm4
>
> [2...