search for: load_bonus_bytes

Displaying 5 results from an estimated 5 matches for "load_bonus_bytes".

2016 Mar 11
3
masked-load endpoints optimization
.... The real question I have is whether it is legal to read the extra memory, regardless of whether this is a masked load or something else. Note that the x86 backend already does this, so either my proposal is ok for x86, or we're already doing an illegal optimization: define <4 x i32> @load_bonus_bytes(i32* %addr1, <4 x i32> %v) { %ld1 = load i32, i32* %addr1 %addr2 = getelementptr i32, i32* %addr1, i64 3 %ld2 = load i32, i32* %addr2 %vec1 = insertelement <4 x i32> undef, i32 %ld1, i32 0 %vec2 = insertelement <4 x i32> %vec1, i32 %ld2, i32 3 ret <4 x i32> %vec2...
2016 Mar 15
3
the as-if rule / perf vs. security
...ether it is legal to read the extra memory, > regardless of whether this is a masked load or something else. > > Note that the x86 backend already does this, so either my proposal is ok > for x86, or we're already doing an illegal optimization: > > > define <4 x i32> @load_bonus_bytes(i32* %addr1, <4 x i32> %v) { > %ld1 = load i32, i32* %addr1 > %addr2 = getelementptr i32, i32* %addr1, i64 3 > %ld2 = load i32, i32* %addr2 > %vec1 = insertelement <4 x i32> undef, i32 %ld1, i32 0 > %vec2 = insertelement <4 x i32> %vec1, i32 %ld2, i32 3 &g...
2016 Mar 16
3
the as-if rule / perf vs. security
...e extra memory, >> regardless of whether this is a masked load or something else. >> >> Note that the x86 backend already does this, so either my proposal is ok >> for x86, or we're already doing an illegal optimization: >> >> >> define <4 x i32> @load_bonus_bytes(i32* %addr1, <4 x i32> %v) { >> %ld1 = load i32, i32* %addr1 >> %addr2 = getelementptr i32, i32* %addr1, i64 3 >> %ld2 = load i32, i32* %addr2 >> %vec1 = insertelement <4 x i32> undef, i32 %ld1, i32 0 >> %vec2 = insertelement <4 x i32> %vec1...
2016 Mar 16
3
the as-if rule / perf vs. security
...regardless of whether this is a masked load or something else. >>> >>> Note that the x86 backend already does this, so either my proposal is ok >>> for x86, or we're already doing an illegal optimization: >>> >>> >>> define <4 x i32> @load_bonus_bytes(i32* %addr1, <4 x i32> %v) { >>> %ld1 = load i32, i32* %addr1 >>> %addr2 = getelementptr i32, i32* %addr1, i64 3 >>> %ld2 = load i32, i32* %addr2 >>> %vec1 = insertelement <4 x i32> undef, i32 %ld1, i32 0 >>> %vec2 = insertelement &...
2016 Mar 10
2
masked-load endpoints optimization
If we're loading the first and last elements of a vector using a masked load [1], can we replace the masked load with a full vector load? "The result of this operation is equivalent to a regular vector load instruction followed by a ‘select’ between the loaded and the passthru values, predicated on the same mask. However, using this intrinsic prevents exceptions on memory access to