search for: vsht_d4_fold

Displaying 6 results from an estimated 6 matches for "vsht_d4_fold".

2017 Sep 13
2
RFC phantom memory intrinsic
Hi Michael, >Interesting approach but how do you handle more complex offsets, e.g., when the pointer is part of an aggregate? Only one offset does not seem enough to handle generic cases. Yes, correct, this a little bit changed example is not working. #include <x86intrin.h> __m256d vsht_d4_fold(const double* ptr, unsigned long long i) { __m256d foo = (__m256d){ ptr[i], ptr[i+1], ptr[i+2], ptr[i+3] }; return __builtin_shufflevector( foo, foo, 3, 3, 2, 2 ); } But with the aggregate case it is a new level of complexity, should we we care about? There might be some logic that probably wou...
2017 Sep 13
2
RFC phantom memory intrinsic
...approach but how do you handle more complex offsets, e.g., when the pointer is part of an aggregate? Only one offset does not seem enough to handle generic cases. >> Yes, correct, this a little bit changed example is not working. >> #include <x86intrin.h> >> >> __m256d vsht_d4_fold(const double* ptr, unsigned long long i) { >> __m256d foo = (__m256d){ ptr[i], ptr[i+1], ptr[i+2], ptr[i+3] }; >> return __builtin_shufflevector( foo, foo, 3, 3, 2, 2 ); >> } >> But with the aggregate case it is a new level of complexity, should we >> we care abo...
2017 Sep 26
0
RFC phantom memory intrinsic
...do you handle more complex offsets, e.g., when the pointer is part of an aggregate? Only one offset does not seem enough to handle generic cases. >>> Yes, correct, this a little bit changed example is not working. >>> #include <x86intrin.h> >>> >>> __m256d vsht_d4_fold(const double* ptr, unsigned long long i) { >>> __m256d foo = (__m256d){ ptr[i], ptr[i+1], ptr[i+2], ptr[i+3] }; >>> return __builtin_shufflevector( foo, foo, 3, 3, 2, 2 ); >>> } >>> But with the aggregate case it is a new level of complexity, should we &g...
2017 Sep 26
2
RFC phantom memory intrinsic
...inter is part of an aggregate? Only one offset does not seem >>>>> enough to handle generic cases. >>>> >>>> Yes, correct, this a little bit changed example is not working. >>>> #include <x86intrin.h> >>>> >>>> __m256d vsht_d4_fold(const double* ptr, unsigned long long i) { >>>> __m256d foo = (__m256d){ ptr[i], ptr[i+1], ptr[i+2], ptr[i+3] }; >>>> return __builtin_shufflevector( foo, foo, 3, 3, 2, 2 ); >>>> } >>>> But with the aggregate case it is a new level of complexi...
2017 Sep 26
0
RFC phantom memory intrinsic
...er is part of an aggregate? Only one offset does not seem >>>>>> enough to handle generic cases. >>>>> Yes, correct, this a little bit changed example is not working. >>>>> #include <x86intrin.h> >>>>> >>>>> __m256d vsht_d4_fold(const double* ptr, unsigned long long i) { >>>>> __m256d foo = (__m256d){ ptr[i], ptr[i+1], ptr[i+2], ptr[i+3] }; >>>>> return __builtin_shufflevector( foo, foo, 3, 3, 2, 2 ); >>>>> } >>>>> But with the aggregate case it is a new...
2017 Sep 12
3
RFC phantom memory intrinsic
Hi, For PR21780 solution, I plan to add a new functionality to restore memory operations that was once deleted, in this particular case it is the load operations that were deleted by InstCombine, please note that once the load was removed there is no way to restore it back and that prevents us from vectorizing the shuffle operation. There are probably more similar issues where this approach could