Displaying 20 results from an estimated 44 matches for "in0".
Did you mean:
in
2017 Feb 06
2
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...corr_QC[ order ] += silk_RSHIFT64( silk_SMULL( tmp1_QS, state_QS[
0 ] ), 2 * QS - QC );
}
in which corr_QC[0, 1, ..., order] is the only output.
Suppose order = 10, and each stage of the inner loop is noted by s0, s1,
..., s9. And suppose we simultaneously process 8 input in SIMD, from in0 to
in7. Let PROC(inx(sy)) denote processing input[x] at stage y.
If there is no dependency between inx(sy) and in(x+1)(sy), then we can do
this
FOR in=0 TO N WITH in+=8
FOR y=0 TO order-1 WITH y++
PROC(in0(sy) in1(sy) in2(sy) in3(sy) in4(sy) in5(sy) in6(sy) in7(sy))
END FOR
END FOR
Defini...
2017 Feb 07
2
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...N samples at a time, then indeed the approach you
> are describing is the only solution. What I was proposing though is to
> instead chop the "order" in chunks of N. Using your notation, you would
> be doing:
>
> PROC( in0(s0))
> PROC( in0(s1) in1(s0))
> PROC( in0(s2) in1(s1) in2(s0))
> PROC( in0(s3) in1(s2) in2(s1) in3(s0))
> PROC( in0(s4) in1(s3) in2(s2) in3(s1) in4...
2017 Feb 07
3
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...> > are describing is the only solution. What I was proposing though is
> to
> > instead chop the "order" in chunks of N. Using your notation, you
> would
> > be doing:
> >
> > PROC( in0(s0))
> > PROC( in0(s1) in1(s0))
> > PROC( in0(s2) in1(s1) in2(s0))
> > PROC( in0(s3) in1(s2) in2(s1) in3(s0))
> > PROC(...
2017 Feb 06
0
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...ing
chunks of the inputs N samples at a time, then indeed the approach you
are describing is the only solution. What I was proposing though is to
instead chop the "order" in chunks of N. Using your notation, you would
be doing:
PROC( in0(s0))
PROC( in0(s1) in1(s0))
PROC( in0(s2) in1(s1) in2(s0))
PROC( in0(s3) in1(s2) in2(s1) in3(s0))
PROC( in0(s4) in1(s3) in2(s2) in3(s1) in4(s0))
PROC(...
2017 Apr 05
2
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...gt; are describing is the only solution. What I was proposing though
>>> is to
>>> > instead chop the "order" in chunks of N. Using your notation, you
>>> would
>>> > be doing:
>>> >
>>> > PROC(
>>> in0(s0))
>>> > PROC( in0(s1)
>>> in1(s0))
>>> > PROC( in0(s2) in1(s1)
>>> in2(s0))
>>> > PROC( in0(s3) in1(s2) in2(s1)
&...
2017 Feb 07
0
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...ime, then indeed the approach you
> are describing is the only solution. What I was proposing though is to
> instead chop the "order" in chunks of N. Using your notation, you would
> be doing:
>
> PROC( in0(s0))
> PROC( in0(s1) in1(s0))
> PROC( in0(s2) in1(s1) in2(s0))
> PROC( in0(s3) in1(s2) in2(s1) in3(s0))
> PROC( in0(s4) in1(s3) in2...
2017 Apr 03
0
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...oach
>> you
>> > are describing is the only solution. What I was proposing though is
>> to
>> > instead chop the "order" in chunks of N. Using your notation, you
>> would
>> > be doing:
>> >
>> > PROC(
>> in0(s0))
>> > PROC( in0(s1)
>> in1(s0))
>> > PROC( in0(s2) in1(s1)
>> in2(s0))
>> > PROC( in0(s3) in1(s2) in2(s1)
>> in3(s0))
>...
2017 Apr 05
4
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...proposing though is to
> > > instead chop the "order" in chunks of N. Using your
> > notation, you would
> > > be doing:
> > >
> > > PROC(
> > in0(s0))
> > > PROC(
> > in0(s1) in1(s0))
> > > PROC( in0(s2)
> > in1(s1) in2(s0))
> > > PROC( in0(s3) in1(s2)
> >...
2017 Apr 05
0
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...though is to
> > instead chop the "order" in chunks of N. Using your
> notation, you would
> > be doing:
> >
> > PROC(
> in0(s0))
> > PROC(
> in0(s1) in1(s0))
> > PROC( in0(s2)
> in1(s1) in2(s0))
> > PROC( in0(s...
2016 Sep 12
2
builtins name mangling in SPIR 2.0
Hi all,
According to the SPIR 2.0 spec[1], the name of OpenCL builtins are mangled.
However, when I compile OpenCl code with Clang 3.9 with the
"spir64-unknown-unknown" target, Clang generates IR without mangling the
builtins, e.g. for:
__kernel void input_zip_int(__global int *in0) {
*in0 = get_global_id(0);
}
clang generates:
define spir_kernel void @input_zip_int(i32 addrspace(1)* nocapture %in0)
local_unnamed_addr #0 !kernel_arg_addr_space !3 !kernel_arg_access_qual !4
!kernel_arg_type !5 !kernel_arg_base_type !5 !kernel_arg_type_qual !6 {
entry:
%call = tail call s...
2017 Apr 06
0
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...h is to
> > > instead chop the "order" in chunks of N. Using your
> > notation, you would
> > > be doing:
> > >
> > > PROC(
> > in0(s0))
> > > PROC(
> > in0(s1) in1(s0))
> > > PROC( in0(s2)
> > in1(s1) in2(s0))
> > > PROC( in0(s3)...
2017 Jan 31
6
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
Hi,
Attached is a patch with arm neon optimizations for
silk_warped_autocorrelation_FIX(). Please review.
Thanks,
Felicia
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xiph.org/pipermail/opus/attachments/20170131/9a912bb4/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name:
2016 Sep 12
2
builtins name mangling in SPIR 2.0
...1], the name of OpenCL builtins are mangled.
>
>
>
> However, when I compile OpenCl code with Clang 3.9 with the
> "spir64-unknown-unknown" target, Clang generates IR without mangling the
> builtins, e.g. for:
>
>
>
> __kernel void input_zip_int(__global int *in0) {
>
> *in0 = get_global_id(0);
>
> }
>
>
>
> clang generates:
>
>
>
> define spir_kernel void @input_zip_int(i32 addrspace(1)* nocapture %in0)
> local_unnamed_addr #0 !kernel_arg_addr_space !3 !kernel_arg_access_qual !4
> !kernel_arg_type !5 !kernel_arg_...
2016 Sep 16
2
builtins name mangling in SPIR 2.0
...IR 2.0
Hi all,
According to the SPIR 2.0 spec[1], the name of OpenCL builtins are mangled.
However, when I compile OpenCl code with Clang 3.9 with the "spir64-unknown-unknown" target, Clang generates IR without mangling the builtins, e.g. for:
__kernel void input_zip_int(__global int *in0) {
*in0 = get_global_id(0);
}
clang generates:
define spir_kernel void @input_zip_int(i32 addrspace(1)* nocapture %in0) local_unnamed_addr #0 !kernel_arg_addr_space !3 !kernel_arg_access_qual !4 !kernel_arg_type !5 !kernel_arg_base_type !5 !kernel_arg_type_qual !6 {
entry:
%call = tail call s...
2016 Sep 18
2
builtins name mangling in SPIR 2.0
...IR 2.0
Hi all,
According to the SPIR 2.0 spec[1], the name of OpenCL builtins are mangled.
However, when I compile OpenCl code with Clang 3.9 with the "spir64-unknown-unknown" target, Clang generates IR without mangling the builtins, e.g. for:
__kernel void input_zip_int(__global int *in0) {
*in0 = get_global_id(0);
}
clang generates:
define spir_kernel void @input_zip_int(i32 addrspace(1)* nocapture %in0) local_unnamed_addr #0 !kernel_arg_addr_space !3 !kernel_arg_access_qual !4 !kernel_arg_type !5 !kernel_arg_base_type !5 !kernel_arg_type_qual !6 {
entry:
%call = tail call s...
2010 Sep 20
1
ERROR: Object not found
...}
else {
switch <- 0
}
dP1 <- a+b*P1-switch*P1
dP2 <- a-b*P1+switch*P2
list(c(dP1,dP2,dIN))
})
}
# Parameters
a <- 0.1
b <- 0.2
c <- 0.5
parms <- c(a=a,b=b,c=c)
# Initial conditions
P10 <- 100.0
P20 <- 0.0
IN0 <- 0.0
xstart <- c(P1=P10,P2=P20,IN=IN0)
# Time points
times <- seq(0,10,by=1)
out <- as.data.frame(rk4(xstart,times,ode,parms))
[[alternative HTML version deleted]]
2010 Sep 20
1
Ask for help with Error: Object not found
...}
else {
switch <- 0
}
dP1 <- a+b*P1-switch*P1
dP2 <- a-b*P1+switch*P2
list(c(dP1,dP2,dIN))
})
}
# Parameters
a <- 0.1
b <- 0.2
c <- 0.5
parms <- c(a=a,b=b,c=c)
# Initial conditions
P10 <- 100.0
P20 <- 0.0
IN0 <- 0.0
xstart <- c(P1=P10,P2=P20,IN=IN0)
# Time points
times <- seq(0,10,by=1)
out <- as.data.frame(rk4(xstart,times,ode,parms))
[[alternative HTML version deleted]]
2008 Feb 26
8
[PATCH 0/8] RFC: ia64/xen TAKE 2: paravirtualization of hand written assembly code
Hi. I rewrote the patch according to the comments. I adopted generating
in-place code because it looks the quickest way.
The point Eddie wanted to discuss is how to generate code and its ABI.
i.e. in-place generating v.s. direct jump v.s. indirect function call
Indirect function call doesn't make sense because ivt.S is compiled
multi times. And it is up to pv instances to choose in-place
2008 Feb 26
8
[PATCH 0/8] RFC: ia64/xen TAKE 2: paravirtualization of hand written assembly code
Hi. I rewrote the patch according to the comments. I adopted generating
in-place code because it looks the quickest way.
The point Eddie wanted to discuss is how to generate code and its ABI.
i.e. in-place generating v.s. direct jump v.s. indirect function call
Indirect function call doesn't make sense because ivt.S is compiled
multi times. And it is up to pv instances to choose in-place
2010 Oct 13
1
[LLVMdev] EXC_BAD_ACCESS: invalid MemoryBuffer from ContentCache::getBuffer
I'm using the latest llvm/clang 2.8 releases and am getting
EXC_BAD_ACCESS crashes in ContentCache::getBuffer. This happens when
I'm printing out errors from a compilation run and iterating over
TextDiagnosticBuffer returned errors.
When checking the errors, I construct a FullSourceLoc and do:
int LineNum = SourceLoc.getInstantiationLineNumber();
int ColNum =