Displaying 6 results from an estimated 6 matches for "vsht_d4_fold".
2017 Sep 13
2
RFC phantom memory intrinsic
Hi Michael,
>Interesting approach but how do you handle more complex offsets, e.g., when the pointer is part of an aggregate? Only one offset does not seem enough to handle generic cases.
Yes, correct, this a little bit changed example is not working.
#include <x86intrin.h>
__m256d vsht_d4_fold(const double* ptr, unsigned long long i) {
__m256d foo = (__m256d){ ptr[i], ptr[i+1], ptr[i+2], ptr[i+3] };
return __builtin_shufflevector( foo, foo, 3, 3, 2, 2 );
}
But with the aggregate case it is a new level of complexity, should we
we care about? There might be some logic that probably wou...
2017 Sep 13
2
RFC phantom memory intrinsic
...approach but how do you handle more complex offsets, e.g., when the pointer is part of an aggregate? Only one offset does not seem enough to handle generic cases.
>> Yes, correct, this a little bit changed example is not working.
>> #include <x86intrin.h>
>>
>> __m256d vsht_d4_fold(const double* ptr, unsigned long long i) {
>> __m256d foo = (__m256d){ ptr[i], ptr[i+1], ptr[i+2], ptr[i+3] };
>> return __builtin_shufflevector( foo, foo, 3, 3, 2, 2 );
>> }
>> But with the aggregate case it is a new level of complexity, should we
>> we care abo...
2017 Sep 26
0
RFC phantom memory intrinsic
...do you handle more complex offsets, e.g., when the pointer is part of an aggregate? Only one offset does not seem enough to handle generic cases.
>>> Yes, correct, this a little bit changed example is not working.
>>> #include <x86intrin.h>
>>>
>>> __m256d vsht_d4_fold(const double* ptr, unsigned long long i) {
>>> __m256d foo = (__m256d){ ptr[i], ptr[i+1], ptr[i+2], ptr[i+3] };
>>> return __builtin_shufflevector( foo, foo, 3, 3, 2, 2 );
>>> }
>>> But with the aggregate case it is a new level of complexity, should we
&g...
2017 Sep 26
2
RFC phantom memory intrinsic
...inter is part of an aggregate? Only one offset does not seem
>>>>> enough to handle generic cases.
>>>>
>>>> Yes, correct, this a little bit changed example is not working.
>>>> #include <x86intrin.h>
>>>>
>>>> __m256d vsht_d4_fold(const double* ptr, unsigned long long i) {
>>>> __m256d foo = (__m256d){ ptr[i], ptr[i+1], ptr[i+2], ptr[i+3] };
>>>> return __builtin_shufflevector( foo, foo, 3, 3, 2, 2 );
>>>> }
>>>> But with the aggregate case it is a new level of complexi...
2017 Sep 26
0
RFC phantom memory intrinsic
...er is part of an aggregate? Only one offset does not seem
>>>>>> enough to handle generic cases.
>>>>> Yes, correct, this a little bit changed example is not working.
>>>>> #include <x86intrin.h>
>>>>>
>>>>> __m256d vsht_d4_fold(const double* ptr, unsigned long long i) {
>>>>> __m256d foo = (__m256d){ ptr[i], ptr[i+1], ptr[i+2], ptr[i+3] };
>>>>> return __builtin_shufflevector( foo, foo, 3, 3, 2, 2 );
>>>>> }
>>>>> But with the aggregate case it is a new...
2017 Sep 12
3
RFC phantom memory intrinsic
Hi,
For PR21780 solution, I plan to add a new functionality to restore
memory operations that was once deleted, in this particular case it is
the load operations that were deleted by InstCombine, please note that
once the load was removed there is no way to restore it back and that
prevents us from vectorizing the shuffle operation. There are probably
more similar issues where this approach could