thr3ads.net - search: "__builtin

Displaying 9 results from an estimated 9 matches for "__builtin_shufflevector".

2020 Aug 20

Question about llvm vectors

...would like also to discuss reduce_add, there might be multiple ways of doing it right but is there one that is faster? Is the same approach always the best or it depends on the CPU? I believe that those questions are best answered by the compiler. Then some side-notes regarding clang documentation __builtin_shufflevector is not referenced there https://clang.llvm.org/docs/LanguageExtensions.html#vectors-and-extended-vectors Best regards, Alexandre Bique On Wed, Aug 19, 2020 at 8:34 PM Craig Topper <craig.topper at gmail.com> wrote: > I'm not sure everyone would agree that the behavior of a > __b...

Question about llvm vectors

2020 Aug 19

Question about llvm vectors

Hi, I love llvm vectors, yet I wonder why some advanced vector operations are specific to some CPU targets? Let me take an example: /// Horizontally adds the adjacent pairs of values contained in two /// 128-bit vectors of [4 x float]. /// /// \headerfile <x86intrin.h> /// /// This intrinsic corresponds to the <c> VHADDPS </c> instruction. /// /// \param __a /// A

RFC phantom memory intrinsic

2017 Sep 13

RFC phantom memory intrinsic

...te? Only one offset does not seem enough to handle generic cases. Yes, correct, this a little bit changed example is not working. #include <x86intrin.h> __m256d vsht_d4_fold(const double* ptr, unsigned long long i) { __m256d foo = (__m256d){ ptr[i], ptr[i+1], ptr[i+2], ptr[i+3] }; return __builtin_shufflevector( foo, foo, 3, 3, 2, 2 ); } But with the aggregate case it is a new level of complexity, should we we care about? There might be some logic that probably would be mark as dead by InstCombine and we don't want to keep it. BTW: Looks like SLP could not recognize the case either : define <4 x do...

RFC phantom memory intrinsic

2017 Sep 13

RFC phantom memory intrinsic

...ric cases. >> Yes, correct, this a little bit changed example is not working. >> #include <x86intrin.h> >> >> __m256d vsht_d4_fold(const double* ptr, unsigned long long i) { >> __m256d foo = (__m256d){ ptr[i], ptr[i+1], ptr[i+2], ptr[i+3] }; >> return __builtin_shufflevector( foo, foo, 3, 3, 2, 2 ); >> } >> But with the aggregate case it is a new level of complexity, should we >> we care about? There might be some logic that probably would be mark >> as dead by InstCombine and we don't want to keep it. >> BTW: Looks like SLP could not...

RFC phantom memory intrinsic

2017 Sep 26

RFC phantom memory intrinsic

...s, correct, this a little bit changed example is not working. >>> #include <x86intrin.h> >>> >>> __m256d vsht_d4_fold(const double* ptr, unsigned long long i) { >>> __m256d foo = (__m256d){ ptr[i], ptr[i+1], ptr[i+2], ptr[i+3] }; >>> return __builtin_shufflevector( foo, foo, 3, 3, 2, 2 ); >>> } >>> But with the aggregate case it is a new level of complexity, should we >>> we care about? There might be some logic that probably would be mark >>> as dead by InstCombine and we don't want to keep it. >>> BTW: Looks...

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 23

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

On Sun, Sep 21, 2014 at 1:15 PM, Simon Pilgrim <llvm-dev at redking.me.uk> wrote: > On 20 Sep 2014, at 19:44, Chandler Carruth <chandlerc at google.com> wrote: > > > If AVX is available I would expect the vpermilps/vpermilpd instruction > to be used for all float/double single vector shuffles, especially as it > can deal with the folded load case as well - this would

RFC phantom memory intrinsic

2017 Sep 26

RFC phantom memory intrinsic

...ittle bit changed example is not working. >>>> #include <x86intrin.h> >>>> >>>> __m256d vsht_d4_fold(const double* ptr, unsigned long long i) { >>>> __m256d foo = (__m256d){ ptr[i], ptr[i+1], ptr[i+2], ptr[i+3] }; >>>> return __builtin_shufflevector( foo, foo, 3, 3, 2, 2 ); >>>> } >>>> But with the aggregate case it is a new level of complexity, should we >>>> we care about? There might be some logic that probably would be mark >>>> as dead by InstCombine and we don't want to keep it. >&gt...

RFC phantom memory intrinsic

2017 Sep 26

RFC phantom memory intrinsic

...ple is not working. >>>>> #include <x86intrin.h> >>>>> >>>>> __m256d vsht_d4_fold(const double* ptr, unsigned long long i) { >>>>> __m256d foo = (__m256d){ ptr[i], ptr[i+1], ptr[i+2], ptr[i+3] }; >>>>> return __builtin_shufflevector( foo, foo, 3, 3, 2, 2 ); >>>>> } >>>>> But with the aggregate case it is a new level of complexity, should we >>>>> we care about? There might be some logic that probably would be mark >>>>> as dead by InstCombine and we don't want to...

RFC phantom memory intrinsic

2017 Sep 12

RFC phantom memory intrinsic

Hi, For PR21780 solution, I plan to add a new functionality to restore memory operations that was once deleted, in this particular case it is the load operations that were deleted by InstCombine, please note that once the load was removed there is no way to restore it back and that prevents us from vectorizing the shuffle operation. There are probably more similar issues where this approach could

search for: __builtin_shufflevector