Displaying 20 results from an estimated 128 matches for "d16".
Did you mean:
16
2013 Oct 21
1
[LLVMdev] MI scheduler produce badly code with inline function
Hi Andy, I'm working on defining new machine model for my target,
But I don't understand how to define the in-order machine (reservation
tables) in new model.
For example, if target has IF ID EX WB stages
should I do:
let BufferSize=0 in {
def IF: ProcResource<1>; def ID: ProcResource<1>;
def EX: ProcResource<1>; def WB: ProcResource<1>;
}
def :
2012 Sep 06
1
[LLVMdev] Unaligned vector memory access for ARM/NEON.
..., it's entirely possible that it's LLVM that's confused about the alignment requirements here. :)
>
> I think I see, in general, where. I twiddled the IR to give it higher alignment (16 bytes) and get:
> extend: @ @extend
> @ BB#0:
> vldr d16, [r0]
> vmovl.s16 q8, d16
> vstmia r1, {d16, d17}
> vldr d16, [r0, #8]
> add r0, r1, #16
> vmovl.s16 q8, d16
> vstmia r0, {d16, d17}
> bx lr
>
> Note that we're using a plain vldr instruction here to load the d register, not a vld1 instruction. Similarly for t...
2012 Sep 05
0
[LLVMdev] Unaligned vector memory access for ARM/NEON.
Hmmm. Well, it's entirely possible that it's LLVM that's confused about the alignment requirements here. :)
I think I see, in general, where. I twiddled the IR to give it higher alignment (16 bytes) and get:
extend: @ @extend
@ BB#0:
vldr d16, [r0]
vmovl.s16 q8, d16
vstmia r1, {d16, d17}
vldr d16, [r0, #8]
add r0, r1, #16
vmovl.s16 q8, d16
vstmia r0, {d16, d17}
bx lr
Note that we're using a plain vldr instruction here to load the d register, not a vld1 instruction. Similarly for the stores. According to the ARM ARM (DDI 0406...
2011 Nov 16
0
[LLVMdev] LLVM 3.0 release notes ARM Target
what do you mean by "more optimal instructions" ?
-omer
On Wed, Nov 16, 2011 at 1:28 AM, Joe Abbey <jabbey at arxan.com> wrote:
> I've done a first pass over the past 6 months of changes and some notable
> things stood out:
>
> * The ARM backend has reworked Set Jump Long Jump EH Lowering.
> * The ARM backend includes improved support for Cortex-M
> *
2012 Sep 05
3
[LLVMdev] Unaligned vector memory access for ARM/NEON.
Hello Jim,
Thank you for the response. I may be confused about the alignment rules
here.
I had been looking at the ARM RVCT Assembler Guide, which seems to
indicate vld1.16 operates on 16-bit aligned data, unless I am
misinterpreting their table
(Table 5-11 in ARM DUI 0204H, pg 5-70,5-71).
Prior to the table, It does mention the accesses need to be "element"
aligned, where I took
2014 Dec 07
3
[LLVMdev] NEON intrinsics preventing redundant load optimization?
...t i = 0; i < 4; ++i)
result.data[i] = a.data[i] * b.data[i];
return result;
}
void TestVec4Multiply(vec4& a, vec4& b, vec4& result)
{
result = a * b;
}
With -O3 the loop gets vectorized and the code generated looks optimal:
__Z16TestVec4MultiplyR4vec4S0_S0_:
@ BB#0:
vld1.32 {d16, d17}, [r1]
vld1.32 {d18, d19}, [r0]
vmul.f32 q8, q9, q8
vst1.32 {d16, d17}, [r2]
bx lr
However if I replace the operator* with a NEON intrinsic implementation (I know the vectorizer figured out optimal code in this case anyway, but that wasn't true for my real situation) then the temporar...
2011 Nov 16
4
[LLVMdev] LLVM 3.0 release notes ARM Target
I've done a first pass over the past 6 months of changes and some notable things stood out:
* The ARM backend has reworked Set Jump Long Jump EH Lowering.
* The ARM backend includes improved support for Cortex-M
* The ARM backend adds parsing and encoding ARM/Thumb/Thumb2 assembly
There are also many many code generation improvements which select more optimal instructions.
Those seemed
2012 Jul 05
2
[LLVMdev] RE : Vector argument passing abi for ARM ?
...@ @bar
@ BB#0: @ %L.entry
push {r11, lr}
add r0, r1, #2
vldr s0, [r1]
vldr s2, [r0] # <= here load is misaligned
vmovl.u8 q8, d0
vmovl.u8 q9, d1
vmovl.u16 q8, d16
vmovl.u16 q9, d18
vmov r0, r1, d16
vmov r2, r3, d18
bl zzz(PLT)
pop {r11, pc}
with LLVM trunk, assembly looks like:
bar: @ @bar
@ BB#0: @ %L.entry
push {r11, lr}
add r0, r1, #2
vld1.32 {...
2011 Sep 01
0
[PATCH 3/5] resample: Add NEON optimized inner_product_single for fixed point
...en % 4 == 0 */
+static inline int32_t inner_product_single(const int16_t *a, const int16_t *b, unsigned int len)
+{
+ int32_t ret;
+ uint32_t remainder = len % 16;
+ len = len - remainder;
+
+ asm volatile (" cmp %[len], #0\n"
+ " bne 1f\n"
+ " vld1.16 {d16}, [%[b]]!\n"
+ " vld1.16 {d20}, [%[a]]!\n"
+ " subs %[remainder], %[remainder], #4\n"
+ " vmull.s16 q0, d16, d20\n"
+ " beq 5f\n"
+ " b 4f\n"
+ "1:"
+ " vld1.16 {d16, d17, d18, d19}, [%[b]]!\n"...
2012 Aug 02
1
[LLVMdev] Question about arm thumb2 code generation
Thanks andrew for the answer.
I would like to generate code for Cortex-A9 that don't use neon for fp computation but vfpv3 -d16. I've tried some combination of -mattr=+neon,-neonfp,+vfp3,+d16 but couldn't get ".fpu vfpv3-d16" directive generated in assembly file. Do you know how to make it happen ?
Best Regards
Seb
From: Andrew Trick [mailto:atrick at apple.com]
Sent: Saturday, July 28, 2012 2:46 AM
To:...
2018 Apr 24
0
TukeyHSD and glht differ for models with a covariate
...#create the model
mod1 <- lm(VitC~HeadWt+Cult+Date, data=cabbages)
# Using TukeyHSD
TukeyHSD(aov(mod1), which='Date')
#? Tukey multiple comparisons of means
# ? 95% family-wise confidence level
#
#Fit: aov(formula = mod1)
#
#$Date
#????????????? diff??????? lwr????? upr???? p adj
#d20-d16 -0.9216847 -5.5216345 3.678265 0.8797985
#d21-d16? 3.4237706 -1.1761792 8.023720 0.1814431
#d21-d20? 4.3454553 -0.2544945 8.945405 0.0678038
# Tukey contrasts in glht should generate the same difference in means,
but it does not
summary(glht(mod1, linfct=mcp(Date='Tukey')))
#
#???? Simul...
2012 Jul 05
0
[LLVMdev] RE : Vector argument passing abi for ARM ?
...> @ BB#0: @ %L.entry
> push {r11, lr}
> add r0, r1, #2
> vldr s0, [r1]
> vldr s2, [r0] # <= here load is misaligned
> vmovl.u8 q8, d0
> vmovl.u8 q9, d1
> vmovl.u16 q8, d16
> vmovl.u16 q9, d18
> vmov r0, r1, d16
> vmov r2, r3, d18
> bl zzz(PLT)
> pop {r11, pc}
>
> with LLVM trunk, assembly looks like:
>
> bar: @ @bar
> @ BB#0: @ %L.entry
&...
2012 Sep 06
2
[LLVMdev] Unaligned vector memory access for ARM/NEON.
...l, it's entirely possible that it's LLVM that's confused about the alignment requirements here. :)
>
> I think I see, in general, where. I twiddled the IR to give it higher alignment (16 bytes) and get:
> extend: @ @extend
> @ BB#0:
> vldr d16, [r0]
> vmovl.s16 q8, d16
> vstmia r1, {d16, d17}
> vldr d16, [r0, #8]
> add r0, r1, #16
> vmovl.s16 q8, d16
> vstmia r0, {d16, d17}
> bx lr
>
> Note that we're using a plain vldr instruction here to load the d register, not a vld1 instruction. Similarly for th...
2013 Oct 15
0
[LLVMdev] MI scheduler produce badly code with inline function
On Oct 14, 2013, at 3:27 AM, Zakk <zakk0610 at gmail.com> wrote:
> Hi all,
> I meet this problem when compiling the TREAM benchmark (http://www.cs.virginia.edu/stream/FTP/Code/) with enable-misched
>
> The small function will be scheduled as good code, but if opt inline this function, the inline part will be scheduled as bad code.
A bug for this is welcome. Pretty soon, I’ll
2013 Oct 14
2
[LLVMdev] MI scheduler produce badly code with inline function
Hi all,
I meet this problem when compiling the TREAM benchmark (
http://www.cs.virginia.edu/stream/FTP/Code/) with enable-misched
The small function will be scheduled as good code, but if opt inline this
function, the inline part will be scheduled as bad code.
so I rewrite a simple code as attached link (foo.c), and compiled with two
different methods:
*method A:*
*$clang -O3 foo.c -static -S
2010 Jul 05
2
nested for loops
...roblems.
Thanks for your time and consideration.
for(d1 in 0:n){
for(d2 in 0:n){
for(d3 in 0:n){
for(d4 in 0:n){
for(d5 in 0:n){
for(d6 in 0:n){
for(d7 in 0:n){
for(d8 in 0:n){
for(d9 in 0:n){
for(d10 in 0:n){
for(d11 in 0:n){
for(d12 in 0:n){
for(d13 in 0:n){
for(d14 in 0:n){
for(d15 in 0:n){
for(d16 in 0:n){
for(d17 in 0:n){
for(d18 in 0:n){
for(d19 in 0:n){
for(d20 in 0:n){
list=c(d1,d2,d3,d4,d5,d6,d7,d8,d9,d10,d11,d12,d13,d14,d15,d16,d17,d18,d19,d20)
}}}}}}}}}}}}}}}}}}}}
[[alternative HTML version deleted]]
2006 Feb 09
6
gcc4 compiler warnings
Hi all!
The following files emits warnings when compiled with gcc 4.0:
al175.c
bcmxcp_ser.c
belkinunv.c
cyberpower.c
everups.c
powercom.c
solis.c
All warnings seem to be of this variety:
everups.c:38: warning: pointer targets in passing argument 2 of 'ser_get_char' differ in signedness
I suggest that those who fiddles with those drivers fixes the warnings
and verifies that it works
2013 Jun 19
3
[LLVMdev] Vector type LOAD/STORE with post-increment.
...or an
out of tree backend.
I see that that ARM NEON support such load/store so I am using ARM NEON as
an example of what to do.
The problem is I can't get any C or C++ code example to actually generate
vector load/store with post increment.
I am talking about something like this:
vldr d16, [sp, #8]
Does anybody know any C/C++ code example that will generate such code
(especially loop)? Is this supported by the auto-vectorizer?
Thanks.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130619/46...
2012 Sep 06
0
[LLVMdev] Unaligned vector memory access for ARM/NEON.
...#39;s entirely possible that it's LLVM that's confused
> about the alignment requirements here. :)
>
> I think I see, in general, where. I twiddled the IR to give it higher
alignment (16 bytes) and get:
> extend: @ @extend
> @ BB#0:
> vldr d16, [r0]
> vmovl.s16 q8, d16
> vstmia r1, {d16, d17}
> vldr d16, [r0, #8]
> add r0, r1, #16
> vmovl.s16 q8, d16
> vstmia r0, {d16, d17}
> bx lr
>
> Note that we're using a plain vldr instruction here to load the d
register, not a vld1 instruction. Similarly for th...
2010 Nov 12
2
[LLVMdev] Simple NEON optimization
...nt a simple optimization in a NEON case I've seen
these days, most as a matter of exercise, but it also simplifies (just
a bit) the code generated.
The case is simple:
uint32x2_t x, res;
res = vceq_u32(x, vcreate_u32(0));
This will generate the following code:
; zero d16
vmov.i32 d16, #0x0
; load a into d17
movw r0, :lower16:a
movt r0, :upper16:a
vld1.32 {d17}, [r0]
; compare two registers
vceq.i32 d17, d17, d16
But, because the vector is zero, and there is a NEON instruction to
compare ag...