similar to: RFC: speedups with instruction side-data (ADCE, perhaps others?)

Displaying 20 results from an estimated 2000 matches similar to: "RFC: speedups with instruction side-data (ADCE, perhaps others?)"

2015 Sep 14
2
RFC: speedups with instruction side-data (ADCE, perhaps others?)
> On Sep 14, 2015, at 2:28 PM, Sean Silva via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > What is the contract between passes for the marker bit? I.e. do passes need to "clean up" the marker bit to a predetermined value after they run? (or they need to assume that it might be dirty? (like I think you do below)) Good questions, and to add to this, what happens if we
2015 Sep 14
2
RFC: speedups with instruction side-data (ADCE, perhaps others?)
I would assume that it’s just considered to be garbage. I feel like any sort of per-pass side data like this should come with absolute minimal contracts, to avoid introducing any more inter-pass complexity. —escha > On Sep 14, 2015, at 2:28 PM, Sean Silva <chisophugis at gmail.com> wrote: > > What is the contract between passes for the marker bit? I.e. do passes need to
2015 Sep 16
3
RFC: speedups with instruction side-data (ADCE, perhaps others?)
You mean the input test data? I was testing performance using our offline perf suite (which is a ton of out of tree code), so it’s not something I can share, but I imagine any similar test suite put through a typical compiler pipeline will exercise ADCE in similar ways. ADCE’s cost is pretty much the sum of (cost of managing the set) + (cost of eraseinstruction), which in our case turns out to be
2015 Sep 14
3
RFC: speedups with instruction side-data (ADCE, perhaps others?)
> On Sep 14, 2015, at 2:49 PM, Matt Arsenault via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > On 09/14/2015 02:47 PM, escha via llvm-dev wrote: >> I would assume that it’s just considered to be garbage. I feel like any sort of per-pass side data like this should come with absolute minimal contracts, to avoid introducing any more inter-pass complexity. > I would think
2015 Sep 15
3
RFC: speedups with instruction side-data (ADCE, perhaps others?)
Here the main problem seem to be that we don’t have a fast way to check for “is this in a certain set”. In my opinion here the problem is that we don’t have a good (read fast) way for checking that a certain condition applies to an llvm Value. I believe this bit is trying to address that. Maybe there is room for improvement in LLVM set-like data structures? Maybe there is a way to use a
2015 Sep 15
7
RFC: speedups with instruction side-data (ADCE, perhaps others?)
> On Sep 14, 2015, at 5:02 PM, Mehdi Amini via llvm-dev <llvm-dev at lists.llvm.org> wrote: > >> >> On Sep 14, 2015, at 2:58 PM, Pete Cooper via llvm-dev <llvm-dev at lists.llvm.org> wrote: >> >> >>> On Sep 14, 2015, at 2:49 PM, Matt Arsenault via llvm-dev <llvm-dev at lists.llvm.org> wrote: >>> >>> On 09/14/2015 02:47
2015 Sep 14
3
RFC: speedups with instruction side-data (ADCE, perhaps others?)
I did something similar for dominators, for GVN, etc. All see significant speedups. However, the answer i got back when i mentioned this was "things like ptrset and densemap should only have a small performance difference from side data when used and sized right", and i've found this to mostly be true after looking harder. In the case you are looking at, i see: -
2015 Sep 13
3
RFC: faster simplifyInstructionsInBlock/SimplifyInstructions pass
LLVM has two similar bits of infrastructure: a simplifyInstructionsInBlock function and a SimplifyInstructions pass, both intended to be lightweight “fix up this code without doing serious optimizations” functions, as far as I can tell. I don’t think either is used in a performance-sensitive place in-tree; the former is mostly called in minor places when doing CFG twiddling, and the latter seems
2015 Sep 13
2
RFC: faster simplifyInstructionsInBlock/SimplifyInstructions pass
> > Instead of adding the operands to a list, erase the instruction and add them to the worklist wouldn’t be probably faster something like: > > if (Instruction *Used = dyn_cast<Instruction>(*OI)) > if (Used->hasOneUse()) > WorkList.insert(Used); > > If it has one use is going to be the instruction we are going to remove anyway, right? I don’t think this
2015 Aug 25
3
[RFC] design doc for straight-line scalar optimizations
Hi Escha, We certainly would love to generalize them as long as the performance doesn't suffer in general. If you have specific use cases that are regressed due to these optimizations, I am more than happy to take a look. On Mon, Aug 24, 2015 at 6:43 PM, escha <escha at apple.com> wrote: > > On Aug 24, 2015, at 11:10 AM, Jingyue Wu via llvm-dev < > llvm-dev at
2015 Jul 10
3
[LLVMdev] Why change "sub x, 5" to "add x, -5" ?
2015-07-08 17:58 GMT+02:00 escha <escha at apple.com>: > [...] > > If you want to “revert" this sort of thing, you can do it at Select() time > or PreprocessISelDAG(), which is what I did on an out-of-tree backend to > turn add X, -C into sub X, C on selection time. This still lets all the > intermediate optimizations take advantage of the canonicalization. > >
2015 Aug 09
3
[RFC] BasicAA considers address spaces?
Personally I feel the most intuitive approach would be to have an equivalent of isTriviallyDisjoint for IR; we already have a model for how it would work, and it could be a TTI call. I’ve kind of wanted this for a while because there’s a lot of address-space-esque aliasing relationships that can’t be easily modeled on the IR level. For example (in our model), we have some constraints like this:
2016 May 05
6
Code which should exit 1 is exiting 0
I have IR at https://ghostbin.com/paste/daxv5 <https://ghostbin.com/paste/daxv5> which is meant to exit 1, but it is always exiting 0. I'm using it as a template for checking if two functions @test1 and @test2 are equivalent by checking against the exhaustive possible i16 values. For this particular example it should be enough to know that for certain i16, @test1 and @test2 are *not*
2016 Apr 12
2
Implementing a proposed InstCombine optimization
Good point. The same argument seems to apply to copy() too so I suppose it depends how strict we want to be about it. From: fglaser at apple.com [mailto:fglaser at apple.com] On Behalf Of escha at apple.com Sent: 11 April 2016 20:55 To: Daniel Sanders Cc: Alex Rosenberg; llvm-dev at lists.llvm.org; Carlos Liam Subject: Re: [llvm-dev] Implementing a proposed InstCombine optimization On Apr 11,
2016 Aug 23
2
How to describe the RegisterInfo?
Hi Escha, Great to have your comment! Do you have any specific reason for not doing like this? I am not sure whether I understand your point correctly. For "just model one thread", do you mean "only considering ONE of the 8/16 working lanes that running in lock-step way"?? For my case, may be something like I only need to define r0~r127 as register for i32 register (each r#
2015 Jul 08
5
[LLVMdev] Why change "sub x, 5" to "add x, -5" ?
Dear all, I have been working on a new LLVM backend. Some instructions, like sub, can take an positive constante, encoded into 5 bits thus lower than 32, and a register, as operands. Unfortunately, DAGCombiner.cpp changes patterns like 'sub x, 5' into 'add x,-5'. Similarly, I found changes in some IR to IR passes, with no clear gain (at least not clear to me), and even penalty
2015 Aug 12
2
[RFC] BasicAA considers address spaces?
I was lost from the thread at some point. Making the interface more general sounds good to me. This helps to solve Escha's concern that targets can know more about aliasing than just comparing address spaces. If there are no objections, I'll 1) add a new interface to TTI such as isTriviallyDisjoint. It returns false by default. 2) create a new AA that checks this interface, and add it to
2016 May 15
2
RFC: callee saved register verifier
I’ve seen this done in the past without compiler support (for the case of handwritten or JIT’d, etc functions that need checking). It worked something like this: 1. Pass calls to the risky functions through a macro/template or such. 2. In release mode, this turns into a no-op. In debug mode, this sends the calls through a handwritten asm wrapper that sets all the callee-save registers to random
2015 Aug 24
4
[RFC] design doc for straight-line scalar optimizations
Hi, As you may have noticed, since last year, we (Google's CUDA compiler team) have contributed quite a lot to the effort of optimizing LLVM for CUDA programs. I think it's worthwhile to write some docs to wrap them up for two reasons. 1) Whoever wants to understand or work on these optimizations has some detailed docs instead of just source code to refer to. 2) RFC on how to improve
2016 Feb 12
2
Experimental 6502 backend; memory operand folding problem
I never thought I’d see the day when someone proposed treating the 6502 like a GPU… —escha > On Feb 12, 2016, at 6:30 AM, Bruce Hoult via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > I think it would be sufficient to pick 8 or 16 pairs of zero page locations as the "registers". Who needs 128 registers, unless you're also doing inter-procedural register