thr3ads.net - search: "vhaddp"

2020 Aug 19

2

Question about llvm vectors

...nder why some advanced vector operations are specific to some CPU targets? Let me take an example: /// Horizontally adds the adjacent pairs of values contained in two /// 128-bit vectors of [4 x float]. /// /// \headerfile <x86intrin.h> /// /// This intrinsic corresponds to the <c> VHADDPS </c> instruction. /// /// \param __a /// A 128-bit vector of [4 x float] containing one of the source operands. /// The horizontal sums of the values are stored in the lower bits of the /// destination. /// \param __b /// A 128-bit vector of [4 x float] containing one of the sour...

[RFC] llvm-mca: a static performance analysis tool

2018 Mar 01

9

[RFC] llvm-mca: a static performance analysis tool

...- - - Resource pressure by instruction: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] Instructions: - - - - - 1.00 - - - - vmulps %xmm0, %xmm1, %xmm2 - - - - 1.00 - - - - - vhaddps %xmm2, %xmm2, %xmm3 - - - - 1.00 - - - - - vhaddps %xmm3, %xmm3, %xmm4 Instruction Info: [1]: #uOps [2]: Latency [3]: RThroughput [4]: MayLoad [5]: MayStore [6]: HasSideEffects [1] [2] [3] [4] [5] [6] Instructions: 1 2...

[RFC] llvm-mca: a static performance analysis tool

2018 Mar 02

0

[RFC] llvm-mca: a static performance analysis tool

...instruction: > [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] > Instructions: > - - - - - 1.00 - - - - > vmulps %xmm0, %xmm1, %xmm2 > - - - - 1.00 - - - - - > vhaddps %xmm2, %xmm2, %xmm3 > - - - - 1.00 - - - - - > vhaddps %xmm3, %xmm3, %xmm4 > > > Instruction Info: > [1]: #uOps > [2]: Latency > [3]: RThroughput > [4]: MayLoad > [5]: MayStore > [6]: HasSideEffects > > [1]...

[RFC] llvm-mca: a static performance analysis tool

2018 Mar 02

0

[RFC] llvm-mca: a static performance analysis tool

...e pressure by instruction: > [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] > Instructions: > - - - - - 1.00 - - - - > vmulps %xmm0, %xmm1, %xmm2 > - - - - 1.00 - - - - - > vhaddps %xmm2, %xmm2, %xmm3 > - - - - 1.00 - - - - - > vhaddps %xmm3, %xmm3, %xmm4 > > > Instruction Info: > [1]: #uOps > [2]: Latency > [3]: RThroughput > [4]: MayLoad > [5]: MayStore > [6]: HasSideEffects > > [1]...

Question about llvm vectors

2020 Aug 20

2

Question about llvm vectors

...t; horizontal add should sum all the elements to a single scalar value. With > different implementation choices like that its hard to say it should be a > generic operation when the behavior might only make sense for one target's > instruction set. > > The behavior of the 256-bit vhaddps instruction on X86 is also weird since > it treats the upper and lower 128-bits of the sources and destination > independently. That quirk wouldn't make sense in a generic operation. > > You can emulate __builtin_ia32_haddps generically using > __builtin_shufflevector and the +...

[RFC] llvm-mca: a static performance analysis tool

2018 Mar 02

0

[RFC] llvm-mca: a static performance analysis tool

...Below is the > timeline view for the dot-product example from the previous section. > > /////////////// > Timeline view: > 012345 > Index 0123456789 > > [0,0] DeeER. . . vmulps %xmm0, %xmm1, %xmm2 > [0,1] D==eeeER . . vhaddps %xmm2, %xmm2, %xmm3 > [0,2] .D====eeeER . vhaddps %xmm3, %xmm3, %xmm4 > > [1,0] .DeeE-----R . vmulps %xmm0, %xmm1, %xmm2 > [1,1] . D=eeeE---R . vhaddps %xmm2, %xmm2, %xmm3 > [1,2] . D====eeeER . vhaddps %xmm3, %xmm3, %xmm4 > >...

[RFC] llvm-mca: a static performance analysis tool

2018 Mar 02

5

[RFC] llvm-mca: a static performance analysis tool

...quot;. Below is the > timeline view for the dot-product example from the previous section. > > /////////////// > Timeline view: > 012345 > Index 0123456789 > > [0,0] DeeER. . . vmulps %xmm0, %xmm1, %xmm2 > [0,1] D==eeeER . . vhaddps %xmm2, %xmm2, %xmm3 > [0,2] .D====eeeER . vhaddps %xmm3, %xmm3, %xmm4 > > [1,0] .DeeE-----R . vmulps %xmm0, %xmm1, %xmm2 > [1,1] . D=eeeE---R . vhaddps %xmm2, %xmm2, %xmm3 > [1,2] . D====eeeER . vhaddps %xmm3, %xmm3, %xmm4 > > [...

search for: vhaddp