search for: peephole

Displaying 20 results from an estimated 283 matches for "peephole".

2010 Oct 07
2
[LLVMdev] [Q] x86 peephole deficiency
Hi all, I am slowly working on a SwitchInst optimizer (http://llvm.org/PR8125) and now I am running into a deficiency of the x86 peephole optimizer (or jump-threader?). Here is what I get: andl $3, %edi je .LBB0_4 # BB#2: # %nz # in Loop: Header=BB0_1 Depth=1 cmpl $2, %edi je .LBB0_6 # BB#3:...
2010 Oct 13
2
[LLVMdev] [Q] x86 peephole deficiency
Am 07.10.2010 um 19:50 schrieb Chris Lattner: > > On Oct 6, 2010, at 6:16 PM, Gabor Greif wrote: > >> Hi all, >> >> I am slowly working on a SwitchInst optimizer (http://llvm.org/ >> PR8125) >> and now I am running into a deficiency of the x86 >> peephole optimizer (or jump-threader?). Here is what I get: >> >> >> andl $3, %edi >> je .LBB0_4 >> # BB#2: # %nz >> # in Loop: Header=BB0_1 >> Depth=1 >>...
2019 Nov 22
2
[ARM] Peephole optimization ( instructions tst + add )
...e this optimization should be done in AArch64LoadStoreOptimizer, is it right? From: Eli Friedman [mailto:efriedma at quicinc.com] Sent: Thursday, November 21, 2019 11:55 PM To: Kosov Pavel <kosov.pavel at huawei.com>; LLVM Dev <llvm-dev at lists.llvm.org> Subject: RE: [llvm-dev] [ARM] Peephole optimization ( instructions tst + add ) That transform is legal; it's a missed optimization. -Eli From: llvm-dev <llvm-dev-bounces at lists.llvm.org<mailto:llvm-dev-bounces at lists.llvm.org>> On Behalf Of Kosov Pavel via llvm-dev Sent: Thursday, November 21, 2019 2:00 AM To: llv...
2010 Oct 07
0
[LLVMdev] [Q] x86 peephole deficiency
On Oct 6, 2010, at 6:16 PM, Gabor Greif wrote: > Hi all, > > I am slowly working on a SwitchInst optimizer (http://llvm.org/PR8125) > and now I am running into a deficiency of the x86 > peephole optimizer (or jump-threader?). Here is what I get: > > > andl $3, %edi > je .LBB0_4 > # BB#2: # %nz > # in Loop: Header=BB0_1 > Depth=1 > cmpl $2, %edi >...
2015 Feb 11
2
[LLVMdev] deleting or replacing a MachineInst
I'm writing a peephole pass and I'm done with the X86_64 instruction level detail work. But I'm having difficulty with the basic block surgery of replacing the old MachineInst. The peephole pass gets called per MachineFunction and then iterates over each MachineBasicBlock and in turn over each MachineInst. When...
2010 Oct 13
0
[LLVMdev] [Q] x86 peephole deficiency
...t; The above problem is an inter-block one. Also MCSE seems > to perform value numbering on virtual/physical registers, which > does not map very well to status register bits that are implicitly > defined. > Any chance to recast this issue as a target-independent > (but cmp-specific) peephole problem, that just looks into > predecessor blocks and applies (target-hook-like) subsumption > checks for 'cmp' instructions? I think that extending MachineCSE to do a simple dominator tree walk with llvm::ScopedHashTable would make sense. Status register bits should be handled jus...
2004 Feb 20
1
[LLVMdev] Changes in MachineInstruction/Peephole Optimizer?
Hi all, The register allocator that I implemented is failing in the LLVM cvs version, but not in LLVM 1.1. The generated code fails a check in the x86 peephole optimizer: llc: PeepholeOptimizer.cpp:128: bool <unnamed>::PH::PeepholeOptimize(llvm::Machi neBasicBlock&, llvm::ilist_iterator<llvm::MachineInstr>&): Assertion `MI->getNum Operands() == 2 && "These should all have 2 operands!"' failed. I've tra...
2019 Nov 21
2
[ARM] Peephole optimization ( instructions tst + add )
Hello! I noticed that in some cases clang generates sequence of AND+TST instructions: For example: AND x3, x2, x1 TST x2, x1 I think these instructions should be merged to one: ANDS x3, x2, x1 ( because TST <Xn>, <Xm> is alias for ANDS XZR, <Xn>, <Xm> -
2019 Aug 23
2
Using [GlobalISel] to provide peephole optimizations
...;s being considered/is appealing to people? And/or is the restriction of not allowing Instructions on the LHS quite an intentional design decision? Because it seems that this would provide some value even for those not using GlobalISel as their primary selector, just as a way of quickly describing peephole optimizations and leveraging the very nifty little VM there to implement them. In theory, a lot of pattern fragments could even be added automatically, by comparing pattern fragments and the machine opcodes they represent - giving a free automatically generated "foldImmediate", among othe...
2015 Mar 25
0
[PATCH] nv50/ir: take postFactor into account when doing peephole optimizations
Multiply operations can have a post-factor on them, which other ops don't support. Only perform the peephole optimizations when there is no post-factor involved. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89758 Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> --- src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions...
2017 May 05
2
Machine instruction verifier pass
...on for instruction using physical registers by looking at the LiveIn info of successor basic blocks. 2/ Which means we need Live Variables analysis to happen prior to executing MachineCSE. 3/ Live variable analysis associates Kill/def information with MachineOperands. 4/ In one of the regression Peephole optimizer (which does not uses liveness information) performs certain transformations over MachineBasicBlock which potentially dirties the liveness information computed early. 5/ Now, when Machine Instruction verifier kicks in after Peephole optimizer it reports use after kill violation over a par...
2016 Aug 29
2
Publication
Hi, Can you add the following two publications from our group to the LLVM publications page. - *Alive-FP: Automated Verification of Floating Point Based Peephole Optimizations in LLVM [pdf] <http://www.cs.rutgers.edu/~santosh.nagarakatte/papers/alive-fp-sas16.pdf> *David Menendez, Santosh Nagarakatte, and Aarti Gupta *To Appear in the Proceedings of the 23rd Static Analysis Symposium (SAS 2016 <http://staticanalysis.org/sas2016/>)...
2015 Jan 19
6
[LLVMdev] X86TargetLowering::LowerToBT
...match the pattern and so *LowerToBT* is not called. *Question*: This is during *pseudo instruction expansion*. How could *LowerToBT'*s caller have enough context to match the immediate IR version? In fact, lli isn't calling *LowerToBT* so it isn't matching. But isn't this really a *peephole optimization* issue? LLVM has a generic peephole optimizer, *CodeGen/PeepholeOptimizer.cpp *which has exactly one subclass in *NVPTXTargetMachine.cpp.* But isn't it better to deal with X86 *LowerToBT* in a *PeepholeOptimizer* subclass where you have a small window of instructions rather than...
2018 May 18
1
Constants propagation into printf format string
...char **arr, int len) { >> for (int i = 0; i < len; ++i) { >> printf("Prog: %s", arr[i]); >> } >> } >> >> This transformation would take constant strings/integers/chars and put them >> to the format string. >> > This is not a peephole optimization, therefore it doesn't belong to instcombine. > If you want to try to implement something like this, I'd recommend > taking a look at ConstantFolding or SCCP (and try to understand why > the value doesn't get propagated). What?  Transforming `printf("FOO: %s&q...
2019 May 09
4
Making llvm-xyz -help useful
...NEON assembly =apple - Emit Apple-style NEON assembly -amdgpu-dump-hsa-metadata - Dump AMDGPU HSA Metadata -amdgpu-enable-merge-m0 - Merge and hoist M0 initializations -amdgpu-sdwa-peephole - Enable SDWA peepholer [...] Surely, the style of NEON code to emit from AArch64 backend is not the information I was looking for... I've implemented a straight-forward patch for llvm-cat here https://reviews.llvm.org/D61740, and the result becomes: OVERV...
2011 May 24
1
[LLVMdev] LLVM evaluation
...standard embedded C types (complex, fractional) support 3. Target specific built-in types support (like 48-bit integrals) 4. Types similar to GCC vectorization types 5. Overlapped register classes support (same instructions, but working inside different execution units on separate registers set) 6. Peephole optimization that works on non-sequential instructions (in contrast to GCC’s peephole) 7. Parallelism (explicit instruction bundling) for VLIW architectures 8. Delay slots support 9. Software pipelining 10. Quality of alias analysis 11. Registers renaming optimization 12. Speculative scheduling It...
2018 Jul 25
2
Question about target instruction optimization
...mm16 = 0, and another physical(!) register is known to be 0 (from a previous immediate load, directly or indirectly) - assuming that L = 0 (H might be something else) - the following code: LD DE,0x0000 should become: LD D,L LD E,L I would expect that this needs to be done in a peephole optimizer pass, as during the lowering process, the physical registers are not yet assigned. Now my question: 1. Is that correct (peephole instead of lowering)? Should the lowering always emit the generic, not always optimal "LD DE,<imm16>". Or should the lowering process always...
2015 Jul 17
3
[LLVMdev] 2-address and 3-address instructions
...applicable. How does one go about generating the most compact version? 1. At instruction selection, is there a predicate that can test whether one of the input sources is dead, thus allowing the selection of the 2-address version? 2. Or do I generate 2-address and have to have a custom pass that peepholes to see if a mov reg-to-reg proceeds or follows a 2-address instruction and turn it into a 3-address version? 3. Or do I generate 3-address and have a custom pass that checks if a source and destination register in a 3-address is the same and turn it into a 2-address? Anybody done this already?...
2016 Mar 10
2
[CodeGen] PeepholeOptimizer: optimizing condition dependent instrunctions
...never be erased because this invalidates iterator. I've been fixing a bug in AArch64InstrInfo::optimizeCompareInstr: instructions are converted into S form but it's not checked that they produce the same flags as CMP. The bug exists upstream as well. Together with the fix I want to add some peephole rules for combinations CMP+BRC and CMP+SEL. In the context of optimizeCmpInstr I have all information about CmpInstr. I simply go down and check all instructions which use AArch64::NZCV whether they can be substituted with the simpler version. After all I delete CmpInstr. This approach contradicts...
2011 Feb 18
2
[LLVMdev] Adding "S" suffixed ARM/Thumb2 instructions
....g. 3.3% in SQLite on CortexA8 and works fine. I have some questions though. 1)"neverHasSideEffects" in tablegen means that CPSR is not implicitly defined, doesn't it? 2)What else can be done using that super "S" power? 3)Current optimization implementation works similar to peephole (peephole pitiful cmp optimization was disabled), right before ifcvt. Should I raise it up somewhere? What do you think is the right place for such thing? 4)Consider the following C code: int a, b, c; ... a = b * c; if (a > 0) { ... } One gets the corresponding ARM assembler mul r(a), r(b), r...