search for: d16

Displaying 20 results from an estimated 128 matches for "d16".

Did you mean: 16
2013 Oct 21
1
[LLVMdev] MI scheduler produce badly code with inline function
Hi Andy, I'm working on defining new machine model for my target, But I don't understand how to define the in-order machine (reservation tables) in new model. For example, if target has IF ID EX WB stages should I do: let BufferSize=0 in { def IF: ProcResource<1>; def ID: ProcResource<1>; def EX: ProcResource<1>; def WB: ProcResource<1>; } def :
2012 Sep 06
1
[LLVMdev] Unaligned vector memory access for ARM/NEON.
..., it's entirely possible that it's LLVM that's confused about the alignment requirements here. :) > > I think I see, in general, where. I twiddled the IR to give it higher alignment (16 bytes) and get: > extend: @ @extend > @ BB#0: > vldr d16, [r0] > vmovl.s16 q8, d16 > vstmia r1, {d16, d17} > vldr d16, [r0, #8] > add r0, r1, #16 > vmovl.s16 q8, d16 > vstmia r0, {d16, d17} > bx lr > > Note that we're using a plain vldr instruction here to load the d register, not a vld1 instruction. Similarly for t...
2012 Sep 05
0
[LLVMdev] Unaligned vector memory access for ARM/NEON.
Hmmm. Well, it's entirely possible that it's LLVM that's confused about the alignment requirements here. :) I think I see, in general, where. I twiddled the IR to give it higher alignment (16 bytes) and get: extend: @ @extend @ BB#0: vldr d16, [r0] vmovl.s16 q8, d16 vstmia r1, {d16, d17} vldr d16, [r0, #8] add r0, r1, #16 vmovl.s16 q8, d16 vstmia r0, {d16, d17} bx lr Note that we're using a plain vldr instruction here to load the d register, not a vld1 instruction. Similarly for the stores. According to the ARM ARM (DDI 0406...
2011 Nov 16
0
[LLVMdev] LLVM 3.0 release notes ARM Target
what do you mean by "more optimal instructions" ? -omer On Wed, Nov 16, 2011 at 1:28 AM, Joe Abbey <jabbey at arxan.com> wrote: > I've done a first pass over the past 6 months of changes and some notable > things stood out: > > * The ARM backend has reworked Set Jump Long Jump EH Lowering. > * The ARM backend includes improved support for Cortex-M > *
2012 Sep 05
3
[LLVMdev] Unaligned vector memory access for ARM/NEON.
Hello Jim, Thank you for the response. I may be confused about the alignment rules here. I had been looking at the ARM RVCT Assembler Guide, which seems to indicate vld1.16 operates on 16-bit aligned data, unless I am misinterpreting their table (Table 5-11 in ARM DUI 0204H, pg 5-70,5-71). Prior to the table, It does mention the accesses need to be "element" aligned, where I took
2014 Dec 07
3
[LLVMdev] NEON intrinsics preventing redundant load optimization?
...t i = 0; i < 4; ++i) result.data[i] = a.data[i] * b.data[i]; return result; } void TestVec4Multiply(vec4& a, vec4& b, vec4& result) { result = a * b; } With -O3 the loop gets vectorized and the code generated looks optimal: __Z16TestVec4MultiplyR4vec4S0_S0_: @ BB#0: vld1.32 {d16, d17}, [r1] vld1.32 {d18, d19}, [r0] vmul.f32 q8, q9, q8 vst1.32 {d16, d17}, [r2] bx lr However if I replace the operator* with a NEON intrinsic implementation (I know the vectorizer figured out optimal code in this case anyway, but that wasn't true for my real situation) then the temporar...
2011 Nov 16
4
[LLVMdev] LLVM 3.0 release notes ARM Target
I've done a first pass over the past 6 months of changes and some notable things stood out: * The ARM backend has reworked Set Jump Long Jump EH Lowering. * The ARM backend includes improved support for Cortex-M * The ARM backend adds parsing and encoding ARM/Thumb/Thumb2 assembly There are also many many code generation improvements which select more optimal instructions. Those seemed
2012 Jul 05
2
[LLVMdev] RE : Vector argument passing abi for ARM ?
...@ @bar @ BB#0: @ %L.entry push {r11, lr} add r0, r1, #2 vldr s0, [r1] vldr s2, [r0] # <= here load is misaligned vmovl.u8 q8, d0 vmovl.u8 q9, d1 vmovl.u16 q8, d16 vmovl.u16 q9, d18 vmov r0, r1, d16 vmov r2, r3, d18 bl zzz(PLT) pop {r11, pc} with LLVM trunk, assembly looks like: bar: @ @bar @ BB#0: @ %L.entry push {r11, lr} add r0, r1, #2 vld1.32 {...
2011 Sep 01
0
[PATCH 3/5] resample: Add NEON optimized inner_product_single for fixed point
...en % 4 == 0 */ +static inline int32_t inner_product_single(const int16_t *a, const int16_t *b, unsigned int len) +{ + int32_t ret; + uint32_t remainder = len % 16; + len = len - remainder; + + asm volatile (" cmp %[len], #0\n" + " bne 1f\n" + " vld1.16 {d16}, [%[b]]!\n" + " vld1.16 {d20}, [%[a]]!\n" + " subs %[remainder], %[remainder], #4\n" + " vmull.s16 q0, d16, d20\n" + " beq 5f\n" + " b 4f\n" + "1:" + " vld1.16 {d16, d17, d18, d19}, [%[b]]!\n&quot...
2012 Aug 02
1
[LLVMdev] Question about arm thumb2 code generation
Thanks andrew for the answer. I would like to generate code for Cortex-A9 that don't use neon for fp computation but vfpv3 -d16. I've tried some combination of -mattr=+neon,-neonfp,+vfp3,+d16 but couldn't get ".fpu vfpv3-d16" directive generated in assembly file. Do you know how to make it happen ? Best Regards Seb From: Andrew Trick [mailto:atrick at apple.com] Sent: Saturday, July 28, 2012 2:46 AM To:...
2018 Apr 24
0
TukeyHSD and glht differ for models with a covariate
...#create the model mod1 <- lm(VitC~HeadWt+Cult+Date, data=cabbages) # Using TukeyHSD TukeyHSD(aov(mod1), which='Date') #? Tukey multiple comparisons of means # ? 95% family-wise confidence level # #Fit: aov(formula = mod1) # #$Date #????????????? diff??????? lwr????? upr???? p adj #d20-d16 -0.9216847 -5.5216345 3.678265 0.8797985 #d21-d16? 3.4237706 -1.1761792 8.023720 0.1814431 #d21-d20? 4.3454553 -0.2544945 8.945405 0.0678038 # Tukey contrasts in glht should generate the same difference in means, but it does not summary(glht(mod1, linfct=mcp(Date='Tukey'))) # #???? Simul...
2012 Jul 05
0
[LLVMdev] RE : Vector argument passing abi for ARM ?
...> @ BB#0: @ %L.entry > push {r11, lr} > add r0, r1, #2 > vldr s0, [r1] > vldr s2, [r0] # <= here load is misaligned > vmovl.u8 q8, d0 > vmovl.u8 q9, d1 > vmovl.u16 q8, d16 > vmovl.u16 q9, d18 > vmov r0, r1, d16 > vmov r2, r3, d18 > bl zzz(PLT) > pop {r11, pc} > > with LLVM trunk, assembly looks like: > > bar: @ @bar > @ BB#0: @ %L.entry &...
2012 Sep 06
2
[LLVMdev] Unaligned vector memory access for ARM/NEON.
...l, it's entirely possible that it's LLVM that's confused about the alignment requirements here. :) > > I think I see, in general, where. I twiddled the IR to give it higher alignment (16 bytes) and get: > extend: @ @extend > @ BB#0: > vldr d16, [r0] > vmovl.s16 q8, d16 > vstmia r1, {d16, d17} > vldr d16, [r0, #8] > add r0, r1, #16 > vmovl.s16 q8, d16 > vstmia r0, {d16, d17} > bx lr > > Note that we're using a plain vldr instruction here to load the d register, not a vld1 instruction. Similarly for th...
2013 Oct 15
0
[LLVMdev] MI scheduler produce badly code with inline function
On Oct 14, 2013, at 3:27 AM, Zakk <zakk0610 at gmail.com> wrote: > Hi all, > I meet this problem when compiling the TREAM benchmark (http://www.cs.virginia.edu/stream/FTP/Code/) with enable-misched > > The small function will be scheduled as good code, but if opt inline this function, the inline part will be scheduled as bad code. A bug for this is welcome. Pretty soon, I’ll
2013 Oct 14
2
[LLVMdev] MI scheduler produce badly code with inline function
Hi all, I meet this problem when compiling the TREAM benchmark ( http://www.cs.virginia.edu/stream/FTP/Code/) with enable-misched The small function will be scheduled as good code, but if opt inline this function, the inline part will be scheduled as bad code. so I rewrite a simple code as attached link (foo.c), and compiled with two different methods: *method A:* *$clang -O3 foo.c -static -S
2010 Jul 05
2
nested for loops
...roblems. Thanks for your time and consideration. for(d1 in 0:n){ for(d2 in 0:n){ for(d3 in 0:n){ for(d4 in 0:n){ for(d5 in 0:n){ for(d6 in 0:n){ for(d7 in 0:n){ for(d8 in 0:n){ for(d9 in 0:n){ for(d10 in 0:n){ for(d11 in 0:n){ for(d12 in 0:n){ for(d13 in 0:n){ for(d14 in 0:n){ for(d15 in 0:n){ for(d16 in 0:n){ for(d17 in 0:n){ for(d18 in 0:n){ for(d19 in 0:n){ for(d20 in 0:n){ list=c(d1,d2,d3,d4,d5,d6,d7,d8,d9,d10,d11,d12,d13,d14,d15,d16,d17,d18,d19,d20) }}}}}}}}}}}}}}}}}}}} [[alternative HTML version deleted]]
2006 Feb 09
6
gcc4 compiler warnings
Hi all! The following files emits warnings when compiled with gcc 4.0: al175.c bcmxcp_ser.c belkinunv.c cyberpower.c everups.c powercom.c solis.c All warnings seem to be of this variety: everups.c:38: warning: pointer targets in passing argument 2 of 'ser_get_char' differ in signedness I suggest that those who fiddles with those drivers fixes the warnings and verifies that it works
2013 Jun 19
3
[LLVMdev] Vector type LOAD/STORE with post-increment.
...or an out of tree backend. I see that that ARM NEON support such load/store so I am using ARM NEON as an example of what to do. The problem is I can't get any C or C++ code example to actually generate vector load/store with post increment. I am talking about something like this: vldr d16, [sp, #8] Does anybody know any C/C++ code example that will generate such code (especially loop)? Is this supported by the auto-vectorizer? Thanks. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130619/46...
2012 Sep 06
0
[LLVMdev] Unaligned vector memory access for ARM/NEON.
...#39;s entirely possible that it's LLVM that's confused > about the alignment requirements here. :) > > I think I see, in general, where. I twiddled the IR to give it higher alignment (16 bytes) and get: > extend: @ @extend > @ BB#0: > vldr d16, [r0] > vmovl.s16 q8, d16 > vstmia r1, {d16, d17} > vldr d16, [r0, #8] > add r0, r1, #16 > vmovl.s16 q8, d16 > vstmia r0, {d16, d17} > bx lr > > Note that we're using a plain vldr instruction here to load the d register, not a vld1 instruction. Similarly for th...
2010 Nov 12
2
[LLVMdev] Simple NEON optimization
...nt a simple optimization in a NEON case I've seen these days, most as a matter of exercise, but it also simplifies (just a bit) the code generated. The case is simple: uint32x2_t x, res; res = vceq_u32(x, vcreate_u32(0)); This will generate the following code: ; zero d16 vmov.i32 d16, #0x0 ; load a into d17 movw r0, :lower16:a movt r0, :upper16:a vld1.32 {d17}, [r0] ; compare two registers vceq.i32 d17, d17, d16 But, because the vector is zero, and there is a NEON instruction to compare ag...