similar to: [RFC] Optimizing Comparisons Chains

Displaying 20 results from an estimated 1000 matches similar to: "[RFC] Optimizing Comparisons Chains"

2015 Feb 24
2
[LLVMdev] Question about shouldMergeGEPs in InstructionCombining
On Mon, Feb 23, 2015 at 2:17 PM, Hal Finkel <hfinkel at anl.gov> wrote: > ----- Original Message ----- > > From: "Francois Pichet" <pichet2000 at gmail.com> > > To: "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu> > > Sent: Sunday, February 22, 2015 5:34:11 PM > > Subject: [LLVMdev] Question about shouldMergeGEPs in
2017 Jul 01
2
KNL Assembly Code for Matrix Multiplication
Thank You, It means vmovdqa64 zmm22, zmmword ptr [rip + .LCPI0_0] # zmm22 = [8,9,10,11,12,13,14,15] zmm22 will contain 64 bit constant values which are indexes here zmm22=8, 9, 10, 11, 12,13,14,15. not the values loaded from these locations. and zmm2 contains constant 4000. so, vpmuludq zmm14, zmm10, zmm2 ; will multiply the indexes values with 4000, as for array b the stride is 4000. zmm14=
2017 Mar 09
4
[RFC] bitfield access shrinking
On Thu, Mar 9, 2017 at 10:54 AM, Hal Finkel <hfinkel at anl.gov> wrote: > On 03/09/2017 12:14 PM, Wei Mi via llvm-dev wrote: >> >> In >> http://lists.llvm.org/pipermail/cfe-commits/Week-of-Mon-20120827/063200.html, >> consecutive bitfields are wrapped as a group and represented as a >> large integer and emits loads stores and bit operations appropriate
2014 May 21
5
[LLVMdev] [CodeGenPrepare] Sinking incoming values of a PHI Node
Hi, I want to improve the way CGP sinks the incoming values of a PHI node towards memory accesses. Improving it means a lot to some of our key benchmarks, and I believe can benefit many general cases as well. CGP's OptimizeMemoryInst function handles PHI nodes by running AddressingModeMatcher on all incoming values to see whether they reach consensus on the addressing mode. It does a
2018 Apr 03
4
SCEV and LoopStrengthReduction Formulae
I am attempting to implement a minor loop strength reduction optimization for targets that support compare and jump fusion, specifically TTI::canMacroFuseCmp(). My approach might be wrong; however, I am soliciting the idea for feedback, so that I can implement this correctly. My plan is to add a Supplemental LSR formula to LoopStrengthReduce.cpp that optimizes the following case, but perhaps
2015 Mar 16
2
[LLVMdev] Question about shouldMergeGEPs in InstructionCombining
----- Original Message ----- > From: "Jingyue Wu" <jingyue at google.com> > To: "Daniel Berlin" <dberlin at dberlin.org>, "Mark Heffernan" <meheff at google.com>, "Hal Finkel" <hfinkel at anl.gov> > Cc: "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu> > Sent: Friday, March 13, 2015 1:31:59 PM >
2017 Mar 07
2
multiprecision add/sub
> On Feb 21, 2017, at 9:54 PM, Nemanja Ivanovic via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > I believe that providing additional intrinsics that would directly produce the ISD::ADDC/ISD::SUBC nodes would provide the additional advantage of being able to directly produce these nodes for code that doesn't have anything to do with multiprecision addition/subtraction. I am
2018 Apr 04
0
SCEV and LoopStrengthReduction Formulae
> cmpq %rbx, %r14 > jne .LBB0_1 > > LLVM can perform compare-jump fusion, it already does in certain cases, but > not in the case above. We can remove the cmp above if we were to perform > the following transformation: Do you mean branch-fusion (https://en.wikichip.org/wiki/macro-operation_fusion)? Is there any more limitation why these two or not fused? > -----Original
2011 Feb 18
2
[LLVMdev] EFLAGS and MVT::Glue
The log message for revision 122213 says: > Change the X86 backend to stop using the evil ADDC/ADDE/SUBC/SUBE nodes (which > their carry depenedencies with MVT::Flag operands) and use clean and beautiful > EFLAGS dependences instead. (MVT::Flag has since been renamed to MVT::Glue.) That revision made bug 8404 go away. Am I right in thinking that one of the problems with MVT::Glue is
2017 Mar 09
4
[RFC] bitfield access shrinking
In http://lists.llvm.org/pipermail/cfe-commits/Week-of-Mon-20120827/063200.html, consecutive bitfields are wrapped as a group and represented as a large integer and emits loads stores and bit operations appropriate for extracting bits from within it. It fixes the problem of violating C++11 memory model that original widen load/store of bitfield was facing. It also brings more coalescing
2013 Jun 24
3
[LLVMdev] [llvm] r184698 - Add a flag to defer vectorization into a phase after the inliner and its
> > Just for the record, I have no real expectation that this is a good idea yet... But it's hard to collect numbers without a flag of some kind, and it's also really annoying to craft this flag given the current pass manager, so I figured I would get a skeleton in place that folks could experiment with, and we could keep or delete based on this discussion and any numbers. I agree.
2014 Mar 25
3
[LLVMdev] Getting the Debugging JIT-ed Code with GDB example to work
I'm trying to run the example described at: http://llvm.org/docs/DebuggingJITedCode.html I followed the sample command line session (below, with versions numbers for everything), but gdb doesn't stop at the breakpoints as described. Any idea what is wrong? Thanks, Zach zdevito at derp:~/terra/tests$ > ~/clang+llvm-3.4-x86_64-unknown-ubuntu12.04/bin/clang -cc1 -O0 -g >
2010 Jun 16
4
Migrating from CommunigatePro to Dovecot - anyone done this?
Apologies if this is in the archive - did look but couldn't find it. Does anyone have any experience of migrating from CommunigatePro to Dovecot? We currently run CGP 5.3.4, supporting a small system (20 or so users, one domain). We've been using it for years, and have a mixed bag of MailDir and mbox folders accessed via IMAP clients. Some users have large mail accounts (15GB total).
2016 Oct 18
2
RFC: Killing undef and spreading poison
>> A use of freeze is to enable speculative execution. For example, loop >> switching does the following transformation: >> while (C) { >> if (C2) { >> A >> } else { >> B >> } >> } >> => >> if (C2) { >> while (C) >> A >> } else { >> while (C) >> B >> } >>
2013 Jun 24
0
[LLVMdev] [llvm] r184698 - Add a flag to defer vectorization into a phase after the inliner and its
On Mon, Jun 24, 2013 at 2:59 PM, Nadav Rotem <nrotem at apple.com> wrote: > I agree. The vectorizer is a *lowering* pass, and much like LSR and it > loses information. A few months ago some of us talked about this and came > up with a general draft for the ideal pass ordering. > Where? On the mailing list? > If I remember correctly the plan was that the second half of the
2015 Mar 23
3
[LLVMdev] Changing The '.' Used to Prefix Labels in Assembly Output
I'm working on an LLVM back end with output to assembly file (.s). I'm using the ARM assembly printer. The generated labels (e.g. for a while statement) start with '.' like .LBB0_1 I would like to change the '.' to something else (specifically $$ if it matters). I see a lot of customizability in targetinfo.cpp but not that particular item. Where should I be looking?
2015 Mar 13
3
[LLVMdev] Question about shouldMergeGEPs in InstructionCombining
On Fri, Mar 13, 2015 at 10:16 AM Mark Heffernan <meheff at google.com> wrote: > On Thu, Mar 12, 2015 at 2:34 PM, Hal Finkel <hfinkel at anl.gov> wrote: > >> It is not clear to me at all that preventing the merging is the right >> solution. There are a large number of analysis, including alias analysis, >> and optimizations that use GetUnderlyingObject, and
2018 Jun 29
2
Cleaning up ‘br i1 false’ cases in CodeGenPrepare
> we lower llvm.objectsize later than we should Is there a well-accepted best (or even just better) place to lower objectsize? I ask because I sorta fear that these kinds of problems will become more pronounced as llvm.is.constant, which is also lowered in CGP, gains popularity. (To be clear, I think it totally makes sense to lower is.constant and objectsize in the same place. I'm just
2017 Nov 28
3
storing MBB MCSymbol in custom section
Dear llvm-dev-list, I have created my own custom section to be added at the end into a binary upon compilation which contains address of all basic blocks. As the final address of the basic block is not known until link time, I collect the MCSymbol* Symbol Values per BB in a temp array and at the in the custom section and emit it (emitSymbolValue) into my section within EmitEndOfAsmFile() I have
2018 Sep 20
3
Comparing Clang and GCC: only clang stores updated value in each iteration.
Hi, I have a benchmark (mcf) that is currently slower when compiled with clang compared to gcc 8 (~10%). It seems that a hot loop has a few differences, where one interesting one is that while clang stores an incremented value in each iteration, gcc waits and just stores the final value just once after the loop. The value is a global variable. I wonder if this is something clang does not do