Displaying 20 results from an estimated 400 matches similar to: "InstCombine wrongful (?) optimization on BinOp with SameOperands"
2008 May 02
1
Speedups with Ra and jit
The topic of Ra and jit has come up on this list recently
(see http://www.milbo.users.sonic.net/ra/index.html)
so I thought people might be interested in this little demo. For it I
used my machine, a 3-year old laptop with 2Gb memory running Windows
XP, and the good old convolution example, the same one as used on the
web page, (though the code on the web page has a slight glitch in it).
This
2018 Aug 06
2
Lowering ISD::TRUNCATE
I'm working on defining the instructions and implementing the lowering
code for a Z80 backend. For now, the backend supports only the native
CPU-supported datatypes, which are 8 and 16 bits wide (i.e. no 32 bit
long, float, ... yet).
So far, a lot of the simple stuff like immediate loads and return values
is very straightforward, but now I got stuck with ISD::TRUNCATE, as in:
2011 Sep 08
1
[LLVMdev] [cfe-dev] Proposal: floating point accuracy metadata (OpenCL related)
On Thu, Sep 08, 2011 at 11:15:06AM -0500, Villmow, Micah wrote:
> Peter,
> Is there a way to make this flag globally available? Metadata can be fairly expensive to handle at each node when in many cases it is a global flag and not a per operation flag.
There are two main reasons why I think we shouldn't go for global
flags:
1) It becomes difficult if not impossible to correctly link
2012 Dec 10
3
[LLVMdev] [PATCH] Teaching ScalarEvolution to handle IV=add(zext(trunc(IV)), Step)
Hello all,
I wanted to get some feedback on this patch for ScalarEvolution.
It addresses a performance problem I am seeing for simple benchmark.
Starting with this C code:
01: signed char foo(void)
02: {
03: const int count = 8000;
04: signed char result = 0;
05: int j;
06:
07: for (j = 0; j < count; ++j) {
08: result += (result_t)(3);
09: }
10:
11: return result;
12: }
I
2011 Sep 08
0
[LLVMdev] [cfe-dev] Proposal: floating point accuracy metadata (OpenCL related)
Peter,
Is there a way to make this flag globally available? Metadata can be fairly expensive to handle at each node when in many cases it is a global flag and not a per operation flag.
> -----Original Message-----
> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu]
> On Behalf Of Robert Quill
> Sent: Thursday, September 08, 2011 3:24 AM
> To: Peter
2016 Jul 27
2
Remove zext-unfolding from InstCombine
Hi Sanjay,
thank you a lot for your answer. I understand that in your examples it is desirable that `foo` and `goo` are canonicalized to the same IR, i.e., something like `@goo`. However, I still have a few open questions, but please correct me in case I'm thinking in the wrong direction.
> Am 21.07.2016 um 18:51 schrieb Sanjay Patel <spatel at rotateright.com>:
>
> I've
2012 Feb 17
0
[LLVMdev] Folding an insertelt chain
On Feb 17, 2012, at 12:50 AM, Ivan Llopard wrote:
> Hello,
>
> I've added a little combining operation in DAGCombiner to fold a chain of insertelt nodes if that chain is proved to fully overwrite the very first source vector. In which case, I supposed a build_vector is better. It seems to be safe but I don't know if it is correctly implemented or if it is already done somewhere
2012 Oct 08
3
[LLVMdev] Multiply i8 operands promotes to i32
Hi,
I am trying to complete the hardware multiplier option for MSP430 backend.
As the hardware multiplier in most of the MSP430 devices is for i8 and
i16 operands, with i16 and i32 result, I am lowering MUL_i8 and MUL_I16.
However, the front-end promotes the i8 argument to i32, executes 32-bit
multiplier and truncates to 16-bit, so I never lower MUL_I8 nor MUL_I16
but MUL_I32, wchich is lowered
2012 Dec 17
0
[LLVMdev] [PATCH] Teaching ScalarEvolution to handle IV=add(zext(trunc(IV)), Step)
On Mon, Dec 10, 2012 at 2:13 PM, Matthew Curtis <mcurtis at codeaurora.org> wrote:
> Hello all,
>
> I wanted to get some feedback on this patch for ScalarEvolution.
>
> It addresses a performance problem I am seeing for simple benchmark.
>
> Starting with this C code:
>
> 01: signed char foo(void)
> 02: {
> 03: const int count = 8000;
> 04: signed char
2020 Jan 11
2
[RFC][SDAG] Convert build_vector of ops on extractelts into ops on input vectors
Thanks so much for your feedback Simon.
I am not sure that what I am proposing here is at odds with what you're
referring to (here and in the PR you linked). The key difference AFAICT is
that the pattern I am referring to is probably more aptly described as
"reducing scalarization" than as "vectorization". The reason I say that is
that the inputs are vectors and the output
2012 Feb 17
3
[LLVMdev] Folding an insertelt chain
Hello,
I've added a little combining operation in DAGCombiner to fold a chain
of insertelt nodes if that chain is proved to fully overwrite the very
first source vector. In which case, I supposed a build_vector is better.
It seems to be safe but I don't know if it is correctly implemented or
if it is already done somewhere else. Please find attached the patch.
Regards,
Ivan
2011 Sep 08
1
[LLVMdev] [cfe-dev] Proposal: floating point accuracy metadata (OpenCL related)
Hi Peter,
This sounds like I really good idea. One thing that did occur to me
though from an OpenCL point of view is that ULP accuracy requirements
can differ for embedded and full profile so that may need to be handled
somehow.
Thanks,
Rob
On Wed, 2011-09-07 at 21:55 +0100, Peter Collingbourne wrote:
> Hi,
>
> This is my proposal to add floating point accuracy support to LLVM.
>
2010 Sep 14
1
[LLVMdev] global type legalization?
Returning to an old discussion here....
On Aug 18, 2010, at 10:42 AM, Chris Lattner wrote:
> On Aug 18, 2010, at 10:27 AM, Bob Wilson wrote:
>>> I tend to think that it isn't worth the compile time to try to microoptimize out every compare, but I could be convinced otherwise if there are important use cases we're failing to handle. I also do think that whole-function
2020 Jan 11
2
[RFC][SDAG] Convert build_vector of ops on extractelts into ops on input vectors
Absolutely. We do it for scalars, so it would likely be a matter of just
extending it.
But that is one example. The issue of extracting elements, performing an
operation on each element individually and then rebuilding the vector is
likely more prevalent than that. At least I think that is the case, but
I'll do some analysis to see if it is so or not.
On Sat, Jan 11, 2020 at 6:15 PM Craig
2016 Jul 21
2
Remove zext-unfolding from InstCombine
Hi all,
I have a question regarding a transformation that is carried out in InstCombine, which has been introduced by r48715. It unfolds expressions of the form `zext(or(icmp, (icmp)))` to `or(zext(icmp), zext(icmp)))` to expose pairs of `zext(icmp)`. In a subsequent iteration these `zext(icmp)` pairs could then (possibly) be optimized by another optimization (which has already been there before
2020 Jan 10
2
[RFC][SDAG] Convert build_vector of ops on extractelts into ops on input vectors
I have added a few PPC-specific DAG combines in the past that follow this
pattern on specific operations. Now that it appears that this would be
useful to do on yet another operation, I'm wondering what people think
about doing this in the target-independent DAG Combiner for any
legal/custom operation on the target.
TL; DR;
The generic pattern would look like this:
(build_vector (op
2012 Dec 18
2
[LLVMdev] [PATCH] Teaching ScalarEvolution to handle IV=add(zext(trunc(IV)), Step)
Dan,
Thanks for the response ...
On 12/17/2012 1:53 PM, Dan Gohman wrote:
> On Mon, Dec 10, 2012 at 2:13 PM, Matthew Curtis <mcurtis at codeaurora.org> wrote:
>> Hello all,
>>
>> I wanted to get some feedback on this patch for ScalarEvolution.
>>
>> It addresses a performance problem I am seeing for simple benchmark.
>>
>> Starting with this C
2016 May 31
3
Signed Division and InstCombine
I was looking through the InstCombine pass, and I was wondering why signed
division is not considered a valid operation to combine in the
canEvaluateTruncated function. This means, given the following code:
%conv = sext i16 %0 to i32
%conv1 = sext i16 %1 to i32
%div = sdiv i32 %conv, %conv1
%conv2 = trunc i32 %div to i16
* Assume %0 and %1 are registers created from simple 16-bit loads.
We
2017 Apr 02
6
[Bug 1142] New: invalid binop operation 6nft
https://bugzilla.netfilter.org/show_bug.cgi?id=1142
Bug ID: 1142
Summary: invalid binop operation 6nft
Product: nftables
Version: unspecified
Hardware: x86_64
OS: other
Status: NEW
Severity: major
Priority: P5
Component: nft
Assignee: pablo at netfilter.org
Reporter:
2013 Jan 25
0
[LLVMdev] TargetLowering vs. TargetTransform
Hi Renato,
I think that we need to improve ::isTruncateFree, ::isZextFree, etc to include all of the free conversions. Vector and Scalar.
Non-free conversions are marked with setOperationAction so the generic parts of TTI should be able to give a reasonable cost estimation.
The cost tables should contain cases that are not handled by TTI. So, if we have a clever DAGCombine optimization (that