thr3ads.net - similar to: "[LLVMdev] About the partial update clearence / dependency breaking mechanism"

Displaying 20 results from an estimated 100 matches similar to: "[LLVMdev] About the partial update clearence / dependency breaking mechanism"

[LLVMdev] About the partial update clearence / dependency breaking mechanism

2013 Mar 25

[LLVMdev] About the partial update clearence / dependency breaking mechanism

On Mar 25, 2013, at 5:02 AM, Silviu Baranga <silbar01 at arm.com> wrote: > Hello, > > I am currently looking into the advantages of using the > partial update clearance / dependency breaking mechanism > for some ARM cores. > > It seems that the ARM specific code for this will always > return a clearance of 0 for VLD1LNd32 because of the following > code in

[cfe-dev] [libunwind] __ELF__ macro for arm-none-eabi

2016 Apr 18

[cfe-dev] [libunwind] __ELF__ macro for arm-none-eabi

On 18 April 2016 at 16:33, Silviu Baranga <Silviu.Baranga at arm.com> wrote: > Doing a grep "eabi" * -R | grep darwin in llvm I found the test divmod-eabi.ll > which uses the triple armv7-apple-darwin-eabi. What format does that have? Certainly not ELF. :) But I didn't mean "has eabi on triple", but "is in none-eabi mode", which may have to check a

[cfe-dev] [libunwind] __ELF__ macro for arm-none-eabi

2016 Apr 18

[cfe-dev] [libunwind] __ELF__ macro for arm-none-eabi

On 18 April 2016 at 16:18, Silviu Baranga <Silviu.Baranga at arm.com> wrote: > This doesn't look like something ACLE specific (I can't find it in the ACLE doc). Sorry, I didn't mean it was ACLE, only that you guys were fiddling with macros. :) > This seems to be a generic macro. I think it would make sense to define it > if we know we're emitting ELF. Since the

[LLVMdev] Help with definition of subregisters; spill, rematerialization and implicit uses

2014 Aug 15

[LLVMdev] Help with definition of subregisters; spill, rematerialization and implicit uses

Hi, I have a problem regarding sub-register definitions and LiveIntervals on our target. When a subregister is defined, other parts of the register are always left untouched - they are neither read or def:ed. It however seems that Codegen treats subregister definitions as somehow clobbering the whole register. The SSA-code looks like this after isel: (Reg0 and Reg1 are 16bit registers. Reg2,

[LLVMdev] [RFC][Float2Int] Converting (fcmp Pred, x * F, y) to (ICmp ...)

2015 Apr 29

[LLVMdev] [RFC][Float2Int] Converting (fcmp Pred, x * F, y) to (ICmp ...)

Hi, I'm trying expand the Float2Int pass in order to make it able to optimize expressions like f * x > y, where x and y are integers (we'll assume unsigned for simplicity) and f is a floating point constant. The optimization would convert the expression to something like: (a * x)/b > y where a and b are integers guessed by the compiler (currently using continued

[LLVMdev] [RFC][Float2Int] Converting (fcmp Pred, x * F, y) to (ICmp ...)

2015 Apr 29

[LLVMdev] [RFC][Float2Int] Converting (fcmp Pred, x * F, y) to (ICmp ...)

> On Apr 29, 2015, at 2:33 PM, Matt Arsenault <arsenm2 at gmail.com> wrote: > >> On Apr 29, 2015, at 10:06 AM, Silviu Baranga <Silviu.Baranga at arm.com <mailto:Silviu.Baranga at arm.com>> wrote: >> >> Note that dividing by an integer constant should be a cheap operation >> compared to FP multiplication and comparison as this would get lowered to a

[cfe-dev] [libunwind] __ELF__ macro for arm-none-eabi

2016 Apr 16

[cfe-dev] [libunwind] __ELF__ macro for arm-none-eabi

On 16 April 2016 at 01:44, Zhao, Weiming via cfe-dev <cfe-dev at lists.llvm.org> wrote: > I'm building libunwind for ARM baremetal using clang. > I notice that __ELF__ is used in libunwind and the macro is only defined for > Linux target on ARM. > Should we also predefine that for arm-none-eabi target? Do you mean in Clang's ARMTargetInfo::getTargetDefines() ? I think

[LLVMdev] Help with definition of subregisters; spill, rematerialization and implicit uses

2014 Aug 19

[LLVMdev] Help with definition of subregisters; spill, rematerialization and implicit uses

Hi Quentin, On 08/15/14 19:01, Quentin Colombet wrote: [...] >> The question is: How should true subregister definitions be >> expressed so that they do not interfere with each other? See the >> detailed problem description below. > > We do have a limitation in our current liveness tracking for > sub-register. Therefore, I am not sure that is possible. > >

[LLVMdev] Help with definition of subregisters; spill, rematerialization and implicit uses

2014 Aug 22

[LLVMdev] Help with definition of subregisters; spill, rematerialization and implicit uses

Hi Quentin, On 08/19/14 18:58, Quentin Colombet wrote: [...] > It seems that you will have to debug further the *** Bad machine code: Instruction loads from dead spill slot *** before we can be of any help. Yes, I've done some more digging. Sorry for the long mail... I get: Inline spilling aN40_0_7:%vreg1954 [5000r,5056r:0)[5056r,5348r:1) 0 at 5000r 1 at 5056r At this point I have

[LLVMdev] [LoopVectorizer] Missed vectorization opportunities caused by sext/zext operations

2015 Apr 29

[LLVMdev] [LoopVectorizer] Missed vectorization opportunities caused by sext/zext operations

Hi, This is somewhat similar to the previous thread regarding missed vectorization opportunities (http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-April/084765.html), but maybe different enough to require a new thread. I'm seeing some missed vectorization opportunities in the loop vectorizer because SCEV is not able to fold sext/zext expressions into recurrence expressions (AddRecExpr). This

[LLVMdev] MachineOperand: Subreg defines and the Undef flag

2012 Jul 05

[LLVMdev] MachineOperand: Subreg defines and the Undef flag

Hi, This question relates to the undef flag in the context of sub-register def operands. 1) Firstly, the documentation (comments in the source code) says that in a sub-register def operand, the "IsUndef" flag refers to the part of the register that is not written. 2) Further, the documentation about readsReg() states that a sub-register def implicitly reads the other parts of the

Extending Register Rematerialization

2016 Nov 27

Extending Register Rematerialization

Hello LLVM Developers, We are working on extending currently available register rematerialization to include cases where sequence of multiple instructions is required to rematerialize a value. We had a discussion on this in community mailing list and link is here: http://lists.llvm.org/pipermail/llvm-dev/2016-September/subject.html#104777 >From the above discussion and studying the code we

[LLVMdev] NEON vector instructions and the fast math IR flags

2013 Jun 07

[LLVMdev] NEON vector instructions and the fast math IR flags

>> Darwin uses NEON for floating point, but does *not* (and should not). >> globally enable fast math flags. Use of NEON for FP needs to remain >> achievable without globally setting the fast math flags. Fast math may >> imply reasonably imply NEON, but the opposite direction is not accurate. | Good point. Fast math is probably a too tough requirement. I need to | look

[RFC] llvm-mca: a static performance analysis tool

2018 Mar 02

[RFC] llvm-mca: a static performance analysis tool

+Matthias > On Mar 2, 2018, at 6:42 AM, Andrea Di Biagio <andrea.dibiagio at gmail.com> wrote: > >> Known limitations on X86 processors >> ----------------------------------- >> >> 1) Partial register updates versus full register updates. >> <snip> > > MachineOperand handles this. You just need to create the machine instrs. > >

[LLVMdev] NEON vector instructions and the fast math IR flags

2013 Jun 07

[LLVMdev] NEON vector instructions and the fast math IR flags

> |I just looked again at the +neonfp flag. Compiling with and without > |+neonfp flag seems to only affect scalar types in the attached test > |case. If e.g. the LLVM vectorizer introduces vector instructions on > |LLVM-IR level floating point vectors still yield NEON assembly even if > |compiled with "-mattr=+neon,-neonfp". Is this expected? > > I'm virtually

[Proposal][RFC] Strided Memory Access Vectorization

2016 Jun 30

[Proposal][RFC] Strided Memory Access Vectorization

As a strong advocate of logical vector representation, I'm counting on community liking Michael's RFC and that'll proceed sooner than later. I plan to pitch in (e.g., perf experiments). >Probably can depend on the support provided by below RFC by Michael: > "Allow loop vectorizer to choose vector widths that generate illegal types" >In that case Loop Vectorizer will

[Proposal][RFC] Strided Memory Access Vectorization

2016 Jun 30

[Proposal][RFC] Strided Memory Access Vectorization

One common concern raised for cases where Loop Vectorizer generate bigger types than target supported: Based on VF currently we check the cost and generate the expected set of instruction[s] for bigger type. It has two challenges for bigger types cost is not always correct and code generation may not generate efficient instruction[s]. Probably can depend on the support provided by below RFC by

[RFC] llvm-mca: a static performance analysis tool

2018 Mar 02

[RFC] llvm-mca: a static performance analysis tool

Hi Andrew, Thanks for the feedback! On Fri, Mar 2, 2018 at 1:16 AM, Andrew Trick <atrick at apple.com> wrote: > > On Mar 1, 2018, at 9:22 AM, Andrea Di Biagio <andrea.dibiagio at gmail.com> > wrote: > > Hi all, > > At Sony we developed an LLVM based performance analysis tool named > llvm-mca. We > currently use it internally to statically measure the

[LLVMdev] MachineOperand: Subreg defines and the Undef flag

2012 Jul 05

[LLVMdev] MachineOperand: Subreg defines and the Undef flag

On Jul 4, 2012, at 10:45 PM, Pranav Bhandarkar <pranavb at codeaurora.org> wrote: > Hi, > > This question relates to the undef flag in the context of sub-register def > operands. > > 1) Firstly, the documentation (comments in the source code) says that in a > sub-register def operand, the "IsUndef" flag refers to the part of the > register that is not

[Proposal][RFC] Strided Memory Access Vectorization

2016 Jun 18

[Proposal][RFC] Strided Memory Access Vectorization

>Vectorizer's output should be as clean as vector code can be so that analyses and optimizers downstream can >do a great job optimizing. Guess I should clarify this philosophical position of mine. In terms of vector code optimization that complicates the output of vectorizer: If vectorizer is the best place to perform the optimization, it should do so. This includes the cases like

similar to: [LLVMdev] About the partial update clearence / dependency breaking mechanism