search for: baranga

Displaying 13 results from an estimated 13 matches for "baranga".

2016 Apr 18
2
[cfe-dev] [libunwind] __ELF__ macro for arm-none-eabi
On 18 April 2016 at 16:33, Silviu Baranga <Silviu.Baranga at arm.com> wrote: > Doing a grep "eabi" * -R | grep darwin in llvm I found the test divmod-eabi.ll > which uses the triple armv7-apple-darwin-eabi. What format does that have? Certainly not ELF. :) But I didn't mean "has eabi on triple", but &q...
2016 Apr 18
2
[cfe-dev] [libunwind] __ELF__ macro for arm-none-eabi
On 18 April 2016 at 16:18, Silviu Baranga <Silviu.Baranga at arm.com> wrote: > This doesn't look like something ACLE specific (I can't find it in the ACLE doc). Sorry, I didn't mean it was ACLE, only that you guys were fiddling with macros. :) > This seems to be a generic macro. I think it would make sense to def...
2015 Apr 29
2
[LLVMdev] [RFC][Float2Int] Converting (fcmp Pred, x * F, y) to (ICmp ...)
> On Apr 29, 2015, at 2:33 PM, Matt Arsenault <arsenm2 at gmail.com> wrote: > >> On Apr 29, 2015, at 10:06 AM, Silviu Baranga <Silviu.Baranga at arm.com <mailto:Silviu.Baranga at arm.com>> wrote: >> >> Note that dividing by an integer constant should be a cheap operation >> compared to FP multiplication and comparison as this would get lowered to a >> multiply+subtract+shift sequence (...
2015 Apr 29
2
[LLVMdev] [RFC][Float2Int] Converting (fcmp Pred, x * F, y) to (ICmp ...)
Hi, I'm trying expand the Float2Int pass in order to make it able to optimize expressions like f * x > y, where x and y are integers (we'll assume unsigned for simplicity) and f is a floating point constant. The optimization would convert the expression to something like: (a * x)/b > y where a and b are integers guessed by the compiler (currently using continued
2013 Mar 25
2
[LLVMdev] About the partial update clearence / dependency breaking mechanism
Hello, I am currently looking into the advantages of using the partial update clearance / dependency breaking mechanism for some ARM cores. It seems that the ARM specific code for this will always return a clearance of 0 for VLD1LNd32 because of the following code in getPartialRegUpdateClearance: > if (UseOp != -1 && MI->getOperand(UseOp).readsReg()) > return 0; so
2013 Mar 25
0
[LLVMdev] About the partial update clearence / dependency breaking mechanism
On Mar 25, 2013, at 5:02 AM, Silviu Baranga <silbar01 at arm.com> wrote: > Hello, > > I am currently looking into the advantages of using the > partial update clearance / dependency breaking mechanism > for some ARM cores. > > It seems that the ARM specific code for this will always > return a clearance of 0...
2016 Apr 16
2
[cfe-dev] [libunwind] __ELF__ macro for arm-none-eabi
On 16 April 2016 at 01:44, Zhao, Weiming via cfe-dev <cfe-dev at lists.llvm.org> wrote: > I'm building libunwind for ARM baremetal using clang. > I notice that __ELF__ is used in libunwind and the macro is only defined for > Linux target on ARM. > Should we also predefine that for arm-none-eabi target? Do you mean in Clang's ARMTargetInfo::getTargetDefines() ? I think
2013 Jun 07
0
[LLVMdev] NEON vector instructions and the fast math IR flags
...some transformation, -neonfp disables forcing that transformation. -neonfp doesn't imply any transformations itself. -----Original Message----- From: David Tweed Sent: 07 June 2013 09:01 To: 'Tobias Grosser'; Renato Golin Cc: LLVMdev at cs.uiuc.edu; Tobias Grosser; James Molloy; Silviu Baranga Subject: RE: [LLVMdev] NEON vector instructions and the fast math IR flags >> Darwin uses NEON for floating point, but does *not* (and should not). >> globally enable fast math flags. Use of NEON for FP needs to remain >> achievable without globally setting the fast math flags....
2015 Apr 29
2
[LLVMdev] [LoopVectorizer] Missed vectorization opportunities caused by sext/zext operations
Hi, This is somewhat similar to the previous thread regarding missed vectorization opportunities (http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-April/084765.html), but maybe different enough to require a new thread. I'm seeing some missed vectorization opportunities in the loop vectorizer because SCEV is not able to fold sext/zext expressions into recurrence expressions (AddRecExpr). This
2013 Jun 07
2
[LLVMdev] NEON vector instructions and the fast math IR flags
>> Darwin uses NEON for floating point, but does *not* (and should not). >> globally enable fast math flags. Use of NEON for FP needs to remain >> achievable without globally setting the fast math flags. Fast math may >> imply reasonably imply NEON, but the opposite direction is not accurate. | Good point. Fast math is probably a too tough requirement. I need to | look
2016 Jun 30
1
[Proposal][RFC] Strided Memory Access Vectorization
...ss of Michael's work is behind yours. Thanks, Hideki -----Original Message----- From: Nema, Ashutosh [mailto:Ashutosh.Nema at amd.com] Sent: Wednesday, June 29, 2016 9:50 PM To: Saito, Hideki <hideki.saito at intel.com>; Demikhovsky, Elena <elena.demikhovsky at intel.com>; silviu.baranga at gmail.com; Zaks, Ayal <ayal.zaks at intel.com> Cc: llvm-dev <llvm-dev at lists.llvm.org>; asbirlea at google.com; renato.golin at linaro.org; mssimpso at codeaurora.org; kv.bhat at samsung.com; Shahid, Asghar-ahmad <Asghar-ahmad.Shahid at amd.com>; sanjoy at playingwithpointers...
2016 Jun 30
0
[Proposal][RFC] Strided Memory Access Vectorization
One common concern raised for cases where Loop Vectorizer generate bigger types than target supported: Based on VF currently we check the cost and generate the expected set of instruction[s] for bigger type. It has two challenges for bigger types cost is not always correct and code generation may not generate efficient instruction[s]. Probably can depend on the support provided by below RFC by
2016 Jun 18
2
[Proposal][RFC] Strided Memory Access Vectorization
>Vectorizer's output should be as clean as vector code can be so that analyses and optimizers downstream can >do a great job optimizing. Guess I should clarify this philosophical position of mine. In terms of vector code optimization that complicates the output of vectorizer: If vectorizer is the best place to perform the optimization, it should do so. This includes the cases like