thr3ads.net - similar to: "[LLVMdev] MI scheduler produce badly code with inline function"

Displaying 20 results from an estimated 1000 matches similar to: "[LLVMdev] MI scheduler produce badly code with inline function"

[LLVMdev] MI scheduler produce badly code with inline function

2013 Oct 15

[LLVMdev] MI scheduler produce badly code with inline function

On Oct 14, 2013, at 3:27 AM, Zakk <zakk0610 at gmail.com> wrote: > Hi all, > I meet this problem when compiling the TREAM benchmark (http://www.cs.virginia.edu/stream/FTP/Code/) with enable-misched > > The small function will be scheduled as good code, but if opt inline this function, the inline part will be scheduled as bad code. A bug for this is welcome. Pretty soon, I’ll

[LLVMdev] MI scheduler produce badly code with inline function

2013 Oct 21

[LLVMdev] MI scheduler produce badly code with inline function

Hi Andy, I'm working on defining new machine model for my target, But I don't understand how to define the in-order machine (reservation tables) in new model. For example, if target has IF ID EX WB stages should I do: let BufferSize=0 in { def IF: ProcResource<1>; def ID: ProcResource<1>; def EX: ProcResource<1>; def WB: ProcResource<1>; } def :

[LLVMdev] MI scheduler produce badly code with inline function

2013 Oct 16

[LLVMdev] MI scheduler produce badly code with inline function

Hi Andy, thanks for your help!! The scheduled code by method A is same as B when using the new machine model. it's make sense, but there is the another problem, the scheduled code is badly. load/store instruction always reuse the same register Source: #define N 2000000 static double b[N], c[N]; void Scale () { double scalar = 3.0; for (int j=0;j<N;j++) b[j] =

[PATCH 0/5] ARM NEON optimization for samplerate converter

2011 Sep 01

[PATCH 0/5] ARM NEON optimization for samplerate converter

From: Jyri Sarha <jsarha at ti.com> I optimized Speex resampler for NEON capable ARM CPUs. The first patch should speed up resampling on any platform that can spare the increased memory usage. It would be nice to have these merged to the master branch. Please let me know if there is anything I can do to help the the merge. The patches have been rebased on top of master branch in

[LLVMdev] question about alignment of structures on the stack (arm 32)

2015 Apr 20

[LLVMdev] question about alignment of structures on the stack (arm 32)

Dear community, I faced with code which was generated by llvm, assembly instructions of that code is relying on 8-bytes alignment for structures on the stack. The part of Objective C code is following: -(void)getCharacters:(unichar *)unicode { NSRange range; range.location = 0; range.length = [self length]; printf("%p, %p\n", &range.location, &range.length); And

[LLVMdev] Simple NEON optimization

2010 Nov 12

[LLVMdev] Simple NEON optimization

Hi folks, me again, So, I want to implement a simple optimization in a NEON case I've seen these days, most as a matter of exercise, but it also simplifies (just a bit) the code generated. The case is simple: uint32x2_t x, res; res = vceq_u32(x, vcreate_u32(0)); This will generate the following code: ; zero d16 vmov.i32 d16, #0x0 ; load a

[LLVMdev] Unaligned vector memory access for ARM/NEON.

2012 Sep 05

[LLVMdev] Unaligned vector memory access for ARM/NEON.

Hello Jim, Thank you for the response. I may be confused about the alignment rules here. I had been looking at the ARM RVCT Assembler Guide, which seems to indicate vld1.16 operates on 16-bit aligned data, unless I am misinterpreting their table (Table 5-11 in ARM DUI 0204H, pg 5-70,5-71). Prior to the table, It does mention the accesses need to be "element" aligned, where I took

[LLVMdev] Unaligned vector memory access for ARM/NEON.

2012 Sep 05

[LLVMdev] Unaligned vector memory access for ARM/NEON.

Hmmm. Well, it's entirely possible that it's LLVM that's confused about the alignment requirements here. :) I think I see, in general, where. I twiddled the IR to give it higher alignment (16 bytes) and get: extend: @ @extend @ BB#0: vldr d16, [r0] vmovl.s16 q8, d16 vstmia r1, {d16, d17} vldr d16, [r0, #8] add r0, r1, #16 vmovl.s16 q8, d16 vstmia

[LLVMdev] MI scheduler produce badly code with inline function

2013 Oct 16

[LLVMdev] MI scheduler produce badly code with inline function

On Oct 15, 2013, at 9:28 PM, Zakk <zakk0610 at gmail.com> wrote: > Hi Andy, thanks for your help!! > The scheduled code by method A is same as B when using the new machine model. > it's make sense, but there is the another problem, the scheduled code is badly. > > load/store instruction always reuse the same register I filed PR17593 with this information. However, I

[LLVMdev] Unaligned vector memory access for ARM/NEON.

2012 Sep 06

[LLVMdev] Unaligned vector memory access for ARM/NEON.

On Sep 5, 2012, at 4:58 PM, Jim Grosbach <grosbach at apple.com> wrote: > Hmmm. Well, it's entirely possible that it's LLVM that's confused about the alignment requirements here. :) > > I think I see, in general, where. I twiddled the IR to give it higher alignment (16 bytes) and get: > extend: @ @extend > @ BB#0: > vldr d16,

[LLVMdev] Unaligned vector memory access for ARM/NEON.

2012 Sep 06

[LLVMdev] Unaligned vector memory access for ARM/NEON.

Hello, Thanks again. We did try overestimating the alignment, and saw the vldr you reference here. It looks like a recent change (r161962?) did enable vld1 generation for this case (great!) on darwin, but not linux. I'm not sure if the effect of lowering load <4 x i16>* align 2 to vld1.16 this was intentional in this change or not. If so, my question is what is the preferable way to

[LLVMdev] NEON intrinsics preventing redundant load optimization?

2014 Dec 07

[LLVMdev] NEON intrinsics preventing redundant load optimization?

Hi all, I’m not sure if this is the right list, so apologies if not. Doing some profiling I noticed some of my hand-tuned matrix multiply code with NEON intrinsics was much slower through a C++ template wrapper vs calling the intrinsics function directly. It turned out clang/LLVM was unable to eliminate a temporary even though the case seemed quite straightforward. Unfortunately any loads

[LLVMdev] LLVM 3.0 release notes ARM Target

2011 Nov 16

[LLVMdev] LLVM 3.0 release notes ARM Target

what do you mean by "more optimal instructions" ? -omer On Wed, Nov 16, 2011 at 1:28 AM, Joe Abbey <jabbey at arxan.com> wrote: > I've done a first pass over the past 6 months of changes and some notable > things stood out: > > * The ARM backend has reworked Set Jump Long Jump EH Lowering. > * The ARM backend includes improved support for Cortex-M > *

[LLVMdev] LLVM CodeGen Engineer job opening with Apple's compiler team

2011 May 26

[LLVMdev] LLVM CodeGen Engineer job opening with Apple's compiler team

Hi all, LLVM CodeGen and Tools team at Apple is looking for exceptional compiler engineers. This is a great opportunity to work with many of the leaders in the LLVM community. If you are interested in this position, please send your resume / CV and relevant information to evan.cheng at apple.com Thanks, Evan Job description The Apple compiler team is seeking an engineer who is strongly

[LLVMdev] RE : Vector argument passing abi for ARM ?

2012 Jul 05

[LLVMdev] RE : Vector argument passing abi for ARM ?

Hi Duncan, I also thought it was a bug, especially since it worked with LLVM 3.0, but since it is not defined by ABI, I was not sure if I need to submit it as a BUG. I wanted to be sure that it is an actual BUG before submitting it and got the not-a-bug answer. Here is a small example to reproduce the problem I'm experiencing: ; ModuleID = 'bugparam.ll' target datalayout =

[LLVMdev] Unaligned vector memory access for ARM/NEON.

2012 Sep 06

[LLVMdev] Unaligned vector memory access for ARM/NEON.

On Sep 6, 2012, at 2:48 PM, David Peixotto <dpeixott at codeaurora.org> wrote: > Hi Pete, > > We ran into the same issue with generating vector loads/stores for vectors > with less than word alignment. It seems we took a similar approach to > solving the problem by modifying the logic in allowsUnalignedMemoryAccesses. > > As you and Jim mentioned, it looks like the

[LLVMdev] LLVM 3.0 release notes ARM Target

2011 Nov 16

[LLVMdev] LLVM 3.0 release notes ARM Target

I've done a first pass over the past 6 months of changes and some notable things stood out: * The ARM backend has reworked Set Jump Long Jump EH Lowering. * The ARM backend includes improved support for Cortex-M * The ARM backend adds parsing and encoding ARM/Thumb/Thumb2 assembly There are also many many code generation improvements which select more optimal instructions. Those seemed

[LLVMdev] Unaligned vector memory access for ARM/NEON.

2012 Sep 06

[LLVMdev] Unaligned vector memory access for ARM/NEON.

Hi Pete, We ran into the same issue with generating vector loads/stores for vectors with less than word alignment. It seems we took a similar approach to solving the problem by modifying the logic in allowsUnalignedMemoryAccesses. As you and Jim mentioned, it looks like the vld1/vst1 instructions should support element aligned access for any armv7 implementation (I'm looking at Table A3-1

[LLVMdev] RE : Vector argument passing abi for ARM ?

2012 Jul 05

[LLVMdev] RE : Vector argument passing abi for ARM ?

Hi Sebastien, > I also thought it was a bug, especially since it worked with LLVM 3.0, but since it is not defined by ABI, I was not sure if I need to submit it as a BUG. yes it is a bug. > I wanted to be sure that it is an actual BUG before submitting it and got the not-a-bug answer. I didn't read Nadav's reply as saying there was no bug, in fact he explicitly said in his email

[LLVMdev] Vector argument passing abi for ARM ?

2012 Jul 05

[LLVMdev] Vector argument passing abi for ARM ?

Hi Sebastien, > Thanks for the quick answer, how do I know which type is legal/illegal with respect to calling convention ? the code generators are supposed to produce working code no matter what the parameter type is. The fact that the ARM ABI doesn't specify how <2 x i8> is passed just means that the code generators can pass it using whatever technique it feels like (since it

similar to: [LLVMdev] MI scheduler produce badly code with inline function