search for: vhaddp

Displaying 7 results from an estimated 7 matches for "vhaddp".

Did you mean: vhaddps
2020 Aug 19
2
Question about llvm vectors
...nder why some advanced vector operations are specific to some CPU targets? Let me take an example: /// Horizontally adds the adjacent pairs of values contained in two /// 128-bit vectors of [4 x float]. /// /// \headerfile <x86intrin.h> /// /// This intrinsic corresponds to the <c> VHADDPS </c> instruction. /// /// \param __a /// A 128-bit vector of [4 x float] containing one of the source operands. /// The horizontal sums of the values are stored in the lower bits of the /// destination. /// \param __b /// A 128-bit vector of [4 x float] containing one of the sour...
2018 Mar 01
9
[RFC] llvm-mca: a static performance analysis tool
...- - - Resource pressure by instruction: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] Instructions: - - - - - 1.00 - - - - vmulps %xmm0, %xmm1, %xmm2 - - - - 1.00 - - - - - vhaddps %xmm2, %xmm2, %xmm3 - - - - 1.00 - - - - - vhaddps %xmm3, %xmm3, %xmm4 Instruction Info: [1]: #uOps [2]: Latency [3]: RThroughput [4]: MayLoad [5]: MayStore [6]: HasSideEffects [1] [2] [3] [4] [5] [6] Instructions: 1 2...
2018 Mar 02
0
[RFC] llvm-mca: a static performance analysis tool
...instruction: > [0]    [1]    [2]    [3]    [4]    [5]    [6]    [7]    [8] [9]        > Instructions: >  -      -      -      -      -     1.00    -      -      - -         > vmulps    %xmm0, %xmm1, %xmm2 >  -      -      -      -     1.00    -      -      -      - -         > vhaddps    %xmm2, %xmm2, %xmm3 >  -      -      -      -     1.00    -      -      -      - -         > vhaddps    %xmm3, %xmm3, %xmm4 > > > Instruction Info: > [1]: #uOps > [2]: Latency > [3]: RThroughput > [4]: MayLoad > [5]: MayStore > [6]: HasSideEffects > > [1]...
2018 Mar 02
0
[RFC] llvm-mca: a static performance analysis tool
...e pressure by instruction: > [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] > Instructions: > - - - - - 1.00 - - - - > vmulps %xmm0, %xmm1, %xmm2 > - - - - 1.00 - - - - - > vhaddps %xmm2, %xmm2, %xmm3 > - - - - 1.00 - - - - - > vhaddps %xmm3, %xmm3, %xmm4 > > > Instruction Info: > [1]: #uOps > [2]: Latency > [3]: RThroughput > [4]: MayLoad > [5]: MayStore > [6]: HasSideEffects > > [1]...
2020 Aug 20
2
Question about llvm vectors
...t; horizontal add should sum all the elements to a single scalar value. With > different implementation choices like that its hard to say it should be a > generic operation when the behavior might only make sense for one target's > instruction set. > > The behavior of the 256-bit vhaddps instruction on X86 is also weird since > it treats the upper and lower 128-bits of the sources and destination > independently. That quirk wouldn't make sense in a generic operation. > > You can emulate __builtin_ia32_haddps generically using > __builtin_shufflevector and the +...
2018 Mar 02
0
[RFC] llvm-mca: a static performance analysis tool
...Below is the > timeline view for the dot-product example from the previous section. > > /////////////// > Timeline view: > 012345 > Index 0123456789 > > [0,0] DeeER. . . vmulps %xmm0, %xmm1, %xmm2 > [0,1] D==eeeER . . vhaddps %xmm2, %xmm2, %xmm3 > [0,2] .D====eeeER . vhaddps %xmm3, %xmm3, %xmm4 > > [1,0] .DeeE-----R . vmulps %xmm0, %xmm1, %xmm2 > [1,1] . D=eeeE---R . vhaddps %xmm2, %xmm2, %xmm3 > [1,2] . D====eeeER . vhaddps %xmm3, %xmm3, %xmm4 > >...
2018 Mar 02
5
[RFC] llvm-mca: a static performance analysis tool
...quot;. Below is the > timeline view for the dot-product example from the previous section. > > /////////////// > Timeline view: > 012345 > Index 0123456789 > > [0,0] DeeER. . . vmulps %xmm0, %xmm1, %xmm2 > [0,1] D==eeeER . . vhaddps %xmm2, %xmm2, %xmm3 > [0,2] .D====eeeER . vhaddps %xmm3, %xmm3, %xmm4 > > [1,0] .DeeE-----R . vmulps %xmm0, %xmm1, %xmm2 > [1,1] . D=eeeE---R . vhaddps %xmm2, %xmm2, %xmm3 > [1,2] . D====eeeER . vhaddps %xmm3, %xmm3, %xmm4 > > [...