Displaying 5 results from an estimated 5 matches for "load_bonus_bytes".
2016 Mar 11
3
masked-load endpoints optimization
....
The real question I have is whether it is legal to read the extra memory,
regardless of whether this is a masked load or something else.
Note that the x86 backend already does this, so either my proposal is ok
for x86, or we're already doing an illegal optimization:
define <4 x i32> @load_bonus_bytes(i32* %addr1, <4 x i32> %v) {
%ld1 = load i32, i32* %addr1
%addr2 = getelementptr i32, i32* %addr1, i64 3
%ld2 = load i32, i32* %addr2
%vec1 = insertelement <4 x i32> undef, i32 %ld1, i32 0
%vec2 = insertelement <4 x i32> %vec1, i32 %ld2, i32 3
ret <4 x i32> %vec2...
2016 Mar 15
3
the as-if rule / perf vs. security
...ether it is legal to read the extra memory,
> regardless of whether this is a masked load or something else.
>
> Note that the x86 backend already does this, so either my proposal is ok
> for x86, or we're already doing an illegal optimization:
>
>
> define <4 x i32> @load_bonus_bytes(i32* %addr1, <4 x i32> %v) {
> %ld1 = load i32, i32* %addr1
> %addr2 = getelementptr i32, i32* %addr1, i64 3
> %ld2 = load i32, i32* %addr2
> %vec1 = insertelement <4 x i32> undef, i32 %ld1, i32 0
> %vec2 = insertelement <4 x i32> %vec1, i32 %ld2, i32 3
&g...
2016 Mar 16
3
the as-if rule / perf vs. security
...e extra memory,
>> regardless of whether this is a masked load or something else.
>>
>> Note that the x86 backend already does this, so either my proposal is ok
>> for x86, or we're already doing an illegal optimization:
>>
>>
>> define <4 x i32> @load_bonus_bytes(i32* %addr1, <4 x i32> %v) {
>> %ld1 = load i32, i32* %addr1
>> %addr2 = getelementptr i32, i32* %addr1, i64 3
>> %ld2 = load i32, i32* %addr2
>> %vec1 = insertelement <4 x i32> undef, i32 %ld1, i32 0
>> %vec2 = insertelement <4 x i32> %vec1...
2016 Mar 16
3
the as-if rule / perf vs. security
...regardless of whether this is a masked load or something else.
>>>
>>> Note that the x86 backend already does this, so either my proposal is ok
>>> for x86, or we're already doing an illegal optimization:
>>>
>>>
>>> define <4 x i32> @load_bonus_bytes(i32* %addr1, <4 x i32> %v) {
>>> %ld1 = load i32, i32* %addr1
>>> %addr2 = getelementptr i32, i32* %addr1, i64 3
>>> %ld2 = load i32, i32* %addr2
>>> %vec1 = insertelement <4 x i32> undef, i32 %ld1, i32 0
>>> %vec2 = insertelement &...
2016 Mar 10
2
masked-load endpoints optimization
If we're loading the first and last elements of a vector using a masked
load [1], can we replace the masked load with a full vector load?
"The result of this operation is equivalent to a regular vector load
instruction followed by a ‘select’ between the loaded and the passthru
values, predicated on the same mask. However, using this intrinsic prevents
exceptions on memory access to