thr3ads.net - search: "_

Displaying 8 results from an estimated 8 matches for "__m256d".

Did you mean: __m256i

2018 Jan 10

Suggestions on code generation for SIMD

Thanks Serge! This means for every new intrinsic set, a systematic change should be made to LLVM to support the new intrinsic set, right? The change should include frontend change, IR instruction set change, as well as low level code generation changes? On Tue, Jan 9, 2018 at 12:39 AM, serge guelton via llvm-dev < llvm-dev at lists.llvm.org> wrote: > > The vast majority of the

RFC phantom memory intrinsic

2017 Sep 13

RFC phantom memory intrinsic

Hi Michael, >Interesting approach but how do you handle more complex offsets, e.g., when the pointer is part of an aggregate? Only one offset does not seem enough to handle generic cases. Yes, correct, this a little bit changed example is not working. #include <x86intrin.h> __m256d vsht_d4_fold(const double* ptr, unsigned long long i) { __m256d foo = (__m256d){ ptr[i], ptr[i+1], ptr[i+2], ptr[i+3] }; return __builtin_shufflevector( foo, foo, 3, 3, 2, 2 ); } But with the aggregate case it is a new level of complexity, should we we care about? There might be some logic that...

RFC phantom memory intrinsic

2017 Sep 13

RFC phantom memory intrinsic

...resting approach but how do you handle more complex offsets, e.g., when the pointer is part of an aggregate? Only one offset does not seem enough to handle generic cases. >> Yes, correct, this a little bit changed example is not working. >> #include <x86intrin.h> >> >> __m256d vsht_d4_fold(const double* ptr, unsigned long long i) { >> __m256d foo = (__m256d){ ptr[i], ptr[i+1], ptr[i+2], ptr[i+3] }; >> return __builtin_shufflevector( foo, foo, 3, 3, 2, 2 ); >> } >> But with the aggregate case it is a new level of complexity, should we >&gt...

RFC phantom memory intrinsic

2017 Sep 26

RFC phantom memory intrinsic

...but how do you handle more complex offsets, e.g., when the pointer is part of an aggregate? Only one offset does not seem enough to handle generic cases. >>> Yes, correct, this a little bit changed example is not working. >>> #include <x86intrin.h> >>> >>> __m256d vsht_d4_fold(const double* ptr, unsigned long long i) { >>> __m256d foo = (__m256d){ ptr[i], ptr[i+1], ptr[i+2], ptr[i+3] }; >>> return __builtin_shufflevector( foo, foo, 3, 3, 2, 2 ); >>> } >>> But with the aggregate case it is a new level of complexity,...

RFC phantom memory intrinsic

2017 Sep 26

RFC phantom memory intrinsic

...n the pointer is part of an aggregate? Only one offset does not seem >>>>> enough to handle generic cases. >>>> >>>> Yes, correct, this a little bit changed example is not working. >>>> #include <x86intrin.h> >>>> >>>> __m256d vsht_d4_fold(const double* ptr, unsigned long long i) { >>>> __m256d foo = (__m256d){ ptr[i], ptr[i+1], ptr[i+2], ptr[i+3] }; >>>> return __builtin_shufflevector( foo, foo, 3, 3, 2, 2 ); >>>> } >>>> But with the aggregate case it is a new leve...

RFC phantom memory intrinsic

2017 Sep 26

RFC phantom memory intrinsic

...he pointer is part of an aggregate? Only one offset does not seem >>>>>> enough to handle generic cases. >>>>> Yes, correct, this a little bit changed example is not working. >>>>> #include <x86intrin.h> >>>>> >>>>> __m256d vsht_d4_fold(const double* ptr, unsigned long long i) { >>>>> __m256d foo = (__m256d){ ptr[i], ptr[i+1], ptr[i+2], ptr[i+3] }; >>>>> return __builtin_shufflevector( foo, foo, 3, 3, 2, 2 ); >>>>> } >>>>> But with the aggregate cas...

RFC phantom memory intrinsic

2017 Sep 12

RFC phantom memory intrinsic

Hi, For PR21780 solution, I plan to add a new functionality to restore memory operations that was once deleted, in this particular case it is the load operations that were deleted by InstCombine, please note that once the load was removed there is no way to restore it back and that prevents us from vectorizing the shuffle operation. There are probably more similar issues where this approach could

[RFC] Re-implementing -fveclib with OpenMP

2018 Nov 30

[RFC] Re-implementing -fveclib with OpenMP

Hi all, I am submitting the following RFC [1] to re-implement -fveclib via OpenMP constructs. The RFC was discussed during a round table at the last LLVM developer meeting, and presented during the BoF [2]. The proposal is published on Phabricator, for the purpose of keeping track of the comments, and it now ready for a review from a wider audience after being polished by Hal Finkel and Hideki

search for: __m256d