Displaying 20 results from an estimated 283 matches for "peephole".
2010 Oct 07
2
[LLVMdev] [Q] x86 peephole deficiency
Hi all,
I am slowly working on a SwitchInst optimizer (http://llvm.org/PR8125)
and now I am running into a deficiency of the x86
peephole optimizer (or jump-threader?). Here is what I get:
andl $3, %edi
je .LBB0_4
# BB#2: # %nz
# in Loop: Header=BB0_1
Depth=1
cmpl $2, %edi
je .LBB0_6
# BB#3:...
2010 Oct 13
2
[LLVMdev] [Q] x86 peephole deficiency
Am 07.10.2010 um 19:50 schrieb Chris Lattner:
>
> On Oct 6, 2010, at 6:16 PM, Gabor Greif wrote:
>
>> Hi all,
>>
>> I am slowly working on a SwitchInst optimizer (http://llvm.org/
>> PR8125)
>> and now I am running into a deficiency of the x86
>> peephole optimizer (or jump-threader?). Here is what I get:
>>
>>
>> andl $3, %edi
>> je .LBB0_4
>> # BB#2: # %nz
>> # in Loop: Header=BB0_1
>> Depth=1
>>...
2019 Nov 22
2
[ARM] Peephole optimization ( instructions tst + add )
...e this optimization should be done in AArch64LoadStoreOptimizer, is it right?
From: Eli Friedman [mailto:efriedma at quicinc.com]
Sent: Thursday, November 21, 2019 11:55 PM
To: Kosov Pavel <kosov.pavel at huawei.com>; LLVM Dev <llvm-dev at lists.llvm.org>
Subject: RE: [llvm-dev] [ARM] Peephole optimization ( instructions tst + add )
That transform is legal; it's a missed optimization.
-Eli
From: llvm-dev <llvm-dev-bounces at lists.llvm.org<mailto:llvm-dev-bounces at lists.llvm.org>> On Behalf Of Kosov Pavel via llvm-dev
Sent: Thursday, November 21, 2019 2:00 AM
To: llv...
2010 Oct 07
0
[LLVMdev] [Q] x86 peephole deficiency
On Oct 6, 2010, at 6:16 PM, Gabor Greif wrote:
> Hi all,
>
> I am slowly working on a SwitchInst optimizer (http://llvm.org/PR8125)
> and now I am running into a deficiency of the x86
> peephole optimizer (or jump-threader?). Here is what I get:
>
>
> andl $3, %edi
> je .LBB0_4
> # BB#2: # %nz
> # in Loop: Header=BB0_1
> Depth=1
> cmpl $2, %edi
>...
2015 Feb 11
2
[LLVMdev] deleting or replacing a MachineInst
I'm writing a peephole pass and I'm done with the X86_64 instruction level
detail work. But I'm having difficulty with the basic block surgery
of replacing the old MachineInst.
The peephole pass gets called per MachineFunction and then iterates over
each MachineBasicBlock and in turn over each MachineInst. When...
2010 Oct 13
0
[LLVMdev] [Q] x86 peephole deficiency
...t; The above problem is an inter-block one. Also MCSE seems
> to perform value numbering on virtual/physical registers, which
> does not map very well to status register bits that are implicitly
> defined.
> Any chance to recast this issue as a target-independent
> (but cmp-specific) peephole problem, that just looks into
> predecessor blocks and applies (target-hook-like) subsumption
> checks for 'cmp' instructions?
I think that extending MachineCSE to do a simple dominator tree walk with llvm::ScopedHashTable would make sense.
Status register bits should be handled jus...
2004 Feb 20
1
[LLVMdev] Changes in MachineInstruction/Peephole Optimizer?
Hi all,
The register allocator that I implemented is failing in the LLVM cvs
version, but not in LLVM 1.1. The generated code fails a check in the
x86 peephole optimizer:
llc: PeepholeOptimizer.cpp:128: bool
<unnamed>::PH::PeepholeOptimize(llvm::Machi
neBasicBlock&, llvm::ilist_iterator<llvm::MachineInstr>&): Assertion
`MI->getNum
Operands() == 2 && "These should all have 2 operands!"' failed.
I've tra...
2019 Nov 21
2
[ARM] Peephole optimization ( instructions tst + add )
Hello!
I noticed that in some cases clang generates sequence of AND+TST instructions:
For example:
AND x3, x2, x1
TST x2, x1
I think these instructions should be merged to one:
ANDS x3, x2, x1
( because TST <Xn>, <Xm> is alias for ANDS XZR, <Xn>, <Xm> -
2019 Aug 23
2
Using [GlobalISel] to provide peephole optimizations
...;s being considered/is appealing
to people? And/or is the restriction of not allowing Instructions on the LHS
quite an intentional design decision?
Because it seems that this would provide some value even for those not using
GlobalISel as their primary selector, just as a way of quickly describing
peephole optimizations and leveraging the very nifty little VM there to
implement them. In theory, a lot of pattern fragments could even be added
automatically, by comparing pattern fragments and the machine opcodes they
represent - giving a free automatically generated "foldImmediate", among
othe...
2015 Mar 25
0
[PATCH] nv50/ir: take postFactor into account when doing peephole optimizations
Multiply operations can have a post-factor on them, which other ops
don't support. Only perform the peephole optimizations when there is no
post-factor involved.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89758
Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu>
---
src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions...
2017 May 05
2
Machine instruction verifier pass
...on for
instruction using physical registers
by looking at the LiveIn info of successor basic blocks.
2/ Which means we need Live Variables analysis to happen prior to executing
MachineCSE.
3/ Live variable analysis associates Kill/def information with
MachineOperands.
4/ In one of the regression Peephole optimizer (which does not uses
liveness information)
performs certain transformations over MachineBasicBlock which potentially
dirties the
liveness information computed early.
5/ Now, when Machine Instruction verifier kicks in after Peephole optimizer
it reports use after kill violation
over a par...
2016 Aug 29
2
Publication
Hi,
Can you add the following two publications from our group to the LLVM
publications page.
-
*Alive-FP: Automated Verification of Floating Point Based Peephole
Optimizations in LLVM [pdf]
<http://www.cs.rutgers.edu/~santosh.nagarakatte/papers/alive-fp-sas16.pdf>
*David
Menendez, Santosh Nagarakatte, and Aarti Gupta
*To Appear in the Proceedings of the 23rd Static Analysis Symposium (SAS
2016 <http://staticanalysis.org/sas2016/>)...
2015 Jan 19
6
[LLVMdev] X86TargetLowering::LowerToBT
...match the pattern and so *LowerToBT* is not
called.
*Question*: This is during *pseudo instruction expansion*. How could
*LowerToBT'*s caller have enough context to match the immediate IR version?
In fact, lli isn't calling *LowerToBT* so it isn't matching. But isn't this
really a *peephole optimization* issue?
LLVM has a generic peephole optimizer, *CodeGen/PeepholeOptimizer.cpp
*which has
exactly one subclass in *NVPTXTargetMachine.cpp.*
But isn't it better to deal with X86 *LowerToBT* in a
*PeepholeOptimizer* subclass
where you have a small window of instructions rather than...
2018 May 18
1
Constants propagation into printf format string
...char **arr, int len) {
>> for (int i = 0; i < len; ++i) {
>> printf("Prog: %s", arr[i]);
>> }
>> }
>>
>> This transformation would take constant strings/integers/chars and put them
>> to the format string.
>>
> This is not a peephole optimization, therefore it doesn't belong to instcombine.
> If you want to try to implement something like this, I'd recommend
> taking a look at ConstantFolding or SCCP (and try to understand why
> the value doesn't get propagated).
What? Transforming `printf("FOO: %s&q...
2019 May 09
4
Making llvm-xyz -help useful
...NEON assembly
=apple - Emit Apple-style NEON assembly
-amdgpu-dump-hsa-metadata - Dump AMDGPU HSA Metadata
-amdgpu-enable-merge-m0 - Merge and hoist M0 initializations
-amdgpu-sdwa-peephole - Enable SDWA peepholer
[...]
Surely, the style of NEON code to emit from AArch64 backend is not the information I was looking for...
I've implemented a straight-forward patch for llvm-cat here https://reviews.llvm.org/D61740, and the result becomes:
OVERV...
2011 May 24
1
[LLVMdev] LLVM evaluation
...standard embedded C types (complex, fractional) support
3. Target specific built-in types support (like 48-bit integrals)
4. Types similar to GCC vectorization types
5. Overlapped register classes support (same instructions, but working
inside different execution units on separate registers set)
6. Peephole optimization that works on non-sequential instructions (in
contrast to GCC’s peephole)
7. Parallelism (explicit instruction bundling) for VLIW architectures
8. Delay slots support
9. Software pipelining
10. Quality of alias analysis
11. Registers renaming optimization
12. Speculative scheduling
It...
2018 Jul 25
2
Question about target instruction optimization
...mm16 = 0, and
another physical(!) register is known to be 0 (from a previous immediate
load, directly or indirectly) - assuming that L = 0 (H might be
something else) - the following code:
LD DE,0x0000
should become:
LD D,L
LD E,L
I would expect that this needs to be done in a peephole optimizer pass,
as during the lowering process, the physical registers are not yet assigned.
Now my question:
1. Is that correct (peephole instead of lowering)? Should the lowering
always emit the generic, not always optimal "LD DE,<imm16>". Or should
the lowering process always...
2015 Jul 17
3
[LLVMdev] 2-address and 3-address instructions
...applicable. How does one go about generating
the most compact version?
1. At instruction selection, is there a predicate that can test whether one of
the input sources is dead, thus allowing the selection of the 2-address version?
2. Or do I generate 2-address and have to have a custom pass that peepholes to
see if a mov reg-to-reg proceeds or follows a 2-address instruction and turn
it into a 3-address version?
3. Or do I generate 3-address and have a custom pass that checks if a source
and destination register in a 3-address is the same and turn it into a 2-address?
Anybody done this already?...
2016 Mar 10
2
[CodeGen] PeepholeOptimizer: optimizing condition dependent instrunctions
...never be erased because this invalidates iterator.
I've been fixing a bug in AArch64InstrInfo::optimizeCompareInstr: instructions are converted into S form but it's not checked that they produce the same flags as CMP. The bug exists upstream as well.
Together with the fix I want to add some peephole rules for combinations CMP+BRC and CMP+SEL. In the context of optimizeCmpInstr I have all information about CmpInstr. I simply go down and check all instructions which use AArch64::NZCV whether they can be substituted with the simpler version. After all I delete CmpInstr. This approach contradicts...
2011 Feb 18
2
[LLVMdev] Adding "S" suffixed ARM/Thumb2 instructions
....g. 3.3% in SQLite on
CortexA8 and works fine.
I have some questions though.
1)"neverHasSideEffects" in tablegen means that CPSR is not implicitly
defined, doesn't it?
2)What else can be done using that super "S" power?
3)Current optimization implementation works similar to peephole (peephole
pitiful cmp optimization was disabled),
right before ifcvt. Should I raise it up somewhere? What do you think is the
right place for such thing?
4)Consider the following C code:
int a, b, c;
...
a = b * c;
if (a > 0) { ... }
One gets the corresponding ARM assembler
mul r(a), r(b), r...