Displaying 20 results from an estimated 10000 matches similar to: "[RFC] Adding Intrinsics for Masked Vector Integer Division and Remainder"
2017 Nov 29
3
RFC: Adding 'no-overflow' keyword to 'sdiv'\'udiv' instructions
Introduction:
We would like to add new keyword to 'sdiv'\'udiv' instructions i.e. 'no-overflow'.
This is the updated solution devised in the discussion: http://lists.llvm.org/pipermail/llvm-dev/2017-October/118257.html
The proposed keywords:
"nof" stands for 'no-overflow'
Syntax:
<result> = sdiv nof <ty> <op1>,
2014 Oct 24
20
[LLVMdev] Adding masked vector load and store intrinsics
Hi,
We would like to add support for masked vector loads and stores by introducing new target-independent intrinsics. The loop vectorizer will then be enhanced to optimize loops containing conditional memory accesses by generating these intrinsics for existing targets such as AVX2 and AVX-512. The vectorizer will first ask the target about availability of masked vector loads and stores. The SLP
2014 Oct 24
2
[LLVMdev] Adding masked vector load and store intrinsics
> Why can't we represent the loads as select(mask, load(addr), passthru)?
This suggests masked-off lanes are free to speculatively load from memory. Whereas proposed semantics is that:
> The addressed memory will not be touched for masked-off lanes. In
> particular, if all lanes are masked off no address will be accessed.
Ayal.
-----Original Message-----
From: llvmdev-bounces at
2014 Oct 27
4
[LLVMdev] Adding masked vector load and store intrinsics
we just follow a common recommendation to start with intrinsics:
http://llvm.org/docs/ExtendingLLVM.html
- Elena
From: Owen Anderson [mailto:resistor at mac.com]
Sent: Sunday, October 26, 2014 23:57
To: Demikhovsky, Elena
Cc: llvmdev at cs.uiuc.edu; dag at cray.com
Subject: Re: [LLVMdev] Adding masked vector load and store intrinsics
What is the motivation for using intrinsics
2014 Oct 24
3
[LLVMdev] Adding masked vector load and store intrinsics
> For the loads, I'm must less sure. Why can't we represent the loads as select(mask, load(addr), passthru)? It is true, that the load might get separated from the select so that isel might not see it (because isel if basic-block local), but we can add some code in CodeGenPrep to fix that for targets on which it is useful to do so (which is a more-general solution than the intrinsic
2014 Oct 28
2
[LLVMdev] Adding masked vector load and store intrinsics
Many oveloaded intrinsics may be replaced with instructions - fabs or fma or sqrt.
Chandler will probably explain the criteria. What the diff between fma and fadd? Or fptrunc and fabs?
A new instruction like
%a = loadm <4 x i32>* %addr, <4 x i32> %passthru, i32 4, <4 x i1>%mask
is possible, but may be not very useful for most of targets.
So we start from intrinsics.
-
2020 May 19
2
LV: predication
Invitation accepted, I am happy to help out with reviews, like I did with the previous VP patches.
And of course agreed that things should be well defined, and that we shouldn't paint ourselves in a corner, but I don't think that this is the case. And it's not that I am in a rush, but I don't think this change needs to be predicated on a big change landing first like the LV
2014 Oct 24
6
[LLVMdev] Adding masked vector load and store intrinsics
> On Oct 24, 2014, at 10:57 AM, Adam Nemet <anemet at apple.com> wrote:
>
> On Oct 24, 2014, at 4:24 AM, Demikhovsky, Elena <elena.demikhovsky at intel.com <mailto:elena.demikhovsky at intel.com>> wrote:
>
>> Hi,
>>
>> We would like to add support for masked vector loads and stores by introducing new target-independent intrinsics. The loop
2016 Mar 10
2
masked-load endpoints optimization
If we're loading the first and last elements of a vector using a masked
load [1], can we replace the masked load with a full vector load?
"The result of this operation is equivalent to a regular vector load
instruction followed by a ‘select’ between the loaded and the passthru
values, predicated on the same mask. However, using this intrinsic prevents
exceptions on memory access to
2018 Sep 25
2
Unsafe floating point operation (FDiv & FRem) in LoopVectorizer
Hi,
Consider the following test case:
int foo(float *A, float *B, float *C, int len, int VSMALL) {
for (int i = 0; i < len; i++)
if (C[i] > VSMALL)
A[i] = B[i] / C[i];
}
In this test the div operation is conditional but llvm is generating unconditional div for this case:
vector.body: ; preds = %vector.body, %vector.ph
%index = phi i64 [
2016 Mar 11
3
masked-load endpoints optimization
Thanks, Ashutosh.
Yes, either TTI or TLI could be used to limit the transform if we do it in
CGP rather than the DAG.
The real question I have is whether it is legal to read the extra memory,
regardless of whether this is a masked load or something else.
Note that the x86 backend already does this, so either my proposal is ok
for x86, or we're already doing an illegal optimization:
define
2020 May 19
3
LV: predication
Hi Simon,
Thanks for reposting the example, and looking at it more carefully, I think it is very similar to my first proposal. This was met with some resistance here because it dumps loop information in the vector preheader. Doing it this early, we want to emit this in the vectoriser, puts a restriction on (future) optimisations that transform vector loops to honour/update/support this intrinsic
2020 May 21
2
LV: predication
> The compare of interest is clear, I think. It compares a Vector Induction Variable with a broadcasted loop invariant value, aka the BTC. Obtaining the latter operand is the goal, clearly, but to do so, the former operand needs to be recognized as a VIV.
Yep, exactly that.
> What if this compare is not generated by LV’s fold-tail-by-masking transformation?
Not sure I completely follow
2016 May 31
3
Signed Division and InstCombine
I was looking through the InstCombine pass, and I was wondering why signed
division is not considered a valid operation to combine in the
canEvaluateTruncated function. This means, given the following code:
%conv = sext i16 %0 to i32
%conv1 = sext i16 %1 to i32
%div = sdiv i32 %conv, %conv1
%conv2 = trunc i32 %div to i16
* Assume %0 and %1 are registers created from simple 16-bit loads.
We
2014 Oct 26
2
[LLVMdev] Masked vector intrinsics and name mangling
Hi,
The proposed masked vector intrinsics are overloaded - one intrinsic ID for multiple types.
After name mangling it will look like:
%res = call <16 x i32> @llvm.masked.load.v16i32.p0i32.v16i32.i32.v16i1(i32* %addr, <16 x i32>%passthru, i32 4, <16 x i1> %mask)
6 types x 3 vector sizes = 18 names for one operation
I propose to remove name mangling from these intrinsics:
%res
2014 Oct 26
2
[LLVMdev] Masked vector intrinsics and name mangling
Hal, thank you for your opinion.
I just was confused when I saw so long name " llvm.masked.load.v16i32.p0i32.v16i32.i32.v16i1" .
If we stay with a short name, we do a step towards instruction form.
- Elena
-----Original Message-----
From: Hal Finkel [mailto:hfinkel at anl.gov]
Sent: Sunday, October 26, 2014 17:06
To: Demikhovsky, Elena
Cc: llvmdev at cs.uiuc.edu
Subject: Re:
2014 Oct 26
2
[LLVMdev] Masked vector intrinsics and name mangling
> On Oct 26, 2014, at 8:22 AM, Hal Finkel <hfinkel at anl.gov> wrote:
>
> ----- Original Message -----
>> From: "Elena Demikhovsky" <elena.demikhovsky at intel.com>
>> To: "Hal Finkel" <hfinkel at anl.gov>
>> Cc: llvmdev at cs.uiuc.edu
>> Sent: Sunday, October 26, 2014 10:17:49 AM
>> Subject: RE: [LLVMdev] Masked vector
2016 Mar 15
3
the as-if rule / perf vs. security
[cc'ing cfe-dev because this may require some interpretation of language
law]
My understanding is that the compiler has the freedom to access extra data
in C/C++ (not sure about other languages); AFAIK, the LLVM LangRef is
silent about this. In C/C++, this is based on the "as-if rule":
http://en.cppreference.com/w/cpp/language/as_if
So the question is: where should the optimizer
2016 May 31
1
Signed Division and InstCombine
On 31 May 2016 at 16:02, Dilan Manatunga <manatunga at gmail.com> wrote:
> Just to verify, a 16-bit divion of INT16_MIN by -1 results in INT16_MIN
> again?
No, "sdiv i16 -32768, -1" is undefined behaviour. The version with an
"sext" and "trunc" avoids the undefined behaviour and does return
-32768.
> If the issue only occurs in this case, why
2014 Apr 25
4
[LLVMdev] Proposal: add intrinsics for safe division
On April 25, 2014 at 9:52:35 AM, Eric Christopher (echristo at gmail.com) wrote:
Hi Michael,
> I’d like to propose to extend LLVM IR intrinsics set, adding new ones for
> safe-division. There are intrinsics for detecting overflow errors, like
> sadd.with.overflow, and the intrinsics I’m proposing will augment this set.
>
> The new intrinsics will return a structure with two