search for: in0

Displaying 20 results from an estimated 44 matches for "in0".

Did you mean: in
2017 Feb 06
2
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...corr_QC[ order ] += silk_RSHIFT64( silk_SMULL( tmp1_QS, state_QS[ 0 ] ), 2 * QS - QC ); } in which corr_QC[0, 1, ..., order] is the only output. Suppose order = 10, and each stage of the inner loop is noted by s0, s1, ..., s9. And suppose we simultaneously process 8 input in SIMD, from in0 to in7. Let PROC(inx(sy)) denote processing input[x] at stage y. If there is no dependency between inx(sy) and in(x+1)(sy), then we can do this FOR in=0 TO N WITH in+=8 FOR y=0 TO order-1 WITH y++ PROC(in0(sy) in1(sy) in2(sy) in3(sy) in4(sy) in5(sy) in6(sy) in7(sy)) END FOR END FOR Defini...
2017 Feb 07
2
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...N samples at a time, then indeed the approach you > are describing is the only solution. What I was proposing though is to > instead chop the "order" in chunks of N. Using your notation, you would > be doing: > > PROC( in0(s0)) > PROC( in0(s1) in1(s0)) > PROC( in0(s2) in1(s1) in2(s0)) > PROC( in0(s3) in1(s2) in2(s1) in3(s0)) > PROC( in0(s4) in1(s3) in2(s2) in3(s1) in4...
2017 Feb 07
3
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...> > are describing is the only solution. What I was proposing though is > to > > instead chop the "order" in chunks of N. Using your notation, you > would > > be doing: > > > > PROC( in0(s0)) > > PROC( in0(s1) in1(s0)) > > PROC( in0(s2) in1(s1) in2(s0)) > > PROC( in0(s3) in1(s2) in2(s1) in3(s0)) > > PROC(...
2017 Feb 06
0
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...ing chunks of the inputs N samples at a time, then indeed the approach you are describing is the only solution. What I was proposing though is to instead chop the "order" in chunks of N. Using your notation, you would be doing: PROC( in0(s0)) PROC( in0(s1) in1(s0)) PROC( in0(s2) in1(s1) in2(s0)) PROC( in0(s3) in1(s2) in2(s1) in3(s0)) PROC( in0(s4) in1(s3) in2(s2) in3(s1) in4(s0)) PROC(...
2017 Apr 05
2
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...gt; are describing is the only solution. What I was proposing though >>> is to >>> > instead chop the "order" in chunks of N. Using your notation, you >>> would >>> > be doing: >>> > >>> > PROC( >>> in0(s0)) >>> > PROC( in0(s1) >>> in1(s0)) >>> > PROC( in0(s2) in1(s1) >>> in2(s0)) >>> > PROC( in0(s3) in1(s2) in2(s1) &...
2017 Feb 07
0
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...ime, then indeed the approach you > are describing is the only solution. What I was proposing though is to > instead chop the "order" in chunks of N. Using your notation, you would > be doing: > > PROC( in0(s0)) > PROC( in0(s1) in1(s0)) > PROC( in0(s2) in1(s1) in2(s0)) > PROC( in0(s3) in1(s2) in2(s1) in3(s0)) > PROC( in0(s4) in1(s3) in2...
2017 Apr 03
0
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...oach >> you >> > are describing is the only solution. What I was proposing though is >> to >> > instead chop the "order" in chunks of N. Using your notation, you >> would >> > be doing: >> > >> > PROC( >> in0(s0)) >> > PROC( in0(s1) >> in1(s0)) >> > PROC( in0(s2) in1(s1) >> in2(s0)) >> > PROC( in0(s3) in1(s2) in2(s1) >> in3(s0)) &gt...
2017 Apr 05
4
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...proposing though is to > > > instead chop the "order" in chunks of N. Using your > > notation, you would > > > be doing: > > > > > > PROC( > > in0(s0)) > > > PROC( > > in0(s1) in1(s0)) > > > PROC( in0(s2) > > in1(s1) in2(s0)) > > > PROC( in0(s3) in1(s2) > >...
2017 Apr 05
0
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...though is to > > instead chop the "order" in chunks of N. Using your > notation, you would > > be doing: > > > > PROC( > in0(s0)) > > PROC( > in0(s1) in1(s0)) > > PROC( in0(s2) > in1(s1) in2(s0)) > > PROC( in0(s...
2016 Sep 12
2
builtins name mangling in SPIR 2.0
Hi all, According to the SPIR 2.0 spec[1], the name of OpenCL builtins are mangled. However, when I compile OpenCl code with Clang 3.9 with the "spir64-unknown-unknown" target, Clang generates IR without mangling the builtins, e.g. for: __kernel void input_zip_int(__global int *in0) { *in0 = get_global_id(0); } clang generates: define spir_kernel void @input_zip_int(i32 addrspace(1)* nocapture %in0) local_unnamed_addr #0 !kernel_arg_addr_space !3 !kernel_arg_access_qual !4 !kernel_arg_type !5 !kernel_arg_base_type !5 !kernel_arg_type_qual !6 { entry: %call = tail call s...
2017 Apr 06
0
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...h is to > > > instead chop the "order" in chunks of N. Using your > > notation, you would > > > be doing: > > > > > > PROC( > > in0(s0)) > > > PROC( > > in0(s1) in1(s0)) > > > PROC( in0(s2) > > in1(s1) in2(s0)) > > > PROC( in0(s3)...
2017 Jan 31
6
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
Hi, Attached is a patch with arm neon optimizations for silk_warped_autocorrelation_FIX(). Please review. Thanks, Felicia -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xiph.org/pipermail/opus/attachments/20170131/9a912bb4/attachment-0001.html> -------------- next part -------------- A non-text attachment was scrubbed... Name:
2016 Sep 12
2
builtins name mangling in SPIR 2.0
...1], the name of OpenCL builtins are mangled. > > > > However, when I compile OpenCl code with Clang 3.9 with the > "spir64-unknown-unknown" target, Clang generates IR without mangling the > builtins, e.g. for: > > > > __kernel void input_zip_int(__global int *in0) { > > *in0 = get_global_id(0); > > } > > > > clang generates: > > > > define spir_kernel void @input_zip_int(i32 addrspace(1)* nocapture %in0) > local_unnamed_addr #0 !kernel_arg_addr_space !3 !kernel_arg_access_qual !4 > !kernel_arg_type !5 !kernel_arg_...
2016 Sep 16
2
builtins name mangling in SPIR 2.0
...IR 2.0 Hi all, According to the SPIR 2.0 spec[1], the name of OpenCL builtins are mangled. However, when I compile OpenCl code with Clang 3.9 with the "spir64-unknown-unknown" target, Clang generates IR without mangling the builtins, e.g. for: __kernel void input_zip_int(__global int *in0) { *in0 = get_global_id(0); } clang generates: define spir_kernel void @input_zip_int(i32 addrspace(1)* nocapture %in0) local_unnamed_addr #0 !kernel_arg_addr_space !3 !kernel_arg_access_qual !4 !kernel_arg_type !5 !kernel_arg_base_type !5 !kernel_arg_type_qual !6 { entry: %call = tail call s...
2016 Sep 18
2
builtins name mangling in SPIR 2.0
...IR 2.0 Hi all, According to the SPIR 2.0 spec[1], the name of OpenCL builtins are mangled. However, when I compile OpenCl code with Clang 3.9 with the "spir64-unknown-unknown" target, Clang generates IR without mangling the builtins, e.g. for: __kernel void input_zip_int(__global int *in0) { *in0 = get_global_id(0); } clang generates: define spir_kernel void @input_zip_int(i32 addrspace(1)* nocapture %in0) local_unnamed_addr #0 !kernel_arg_addr_space !3 !kernel_arg_access_qual !4 !kernel_arg_type !5 !kernel_arg_base_type !5 !kernel_arg_type_qual !6 { entry: %call = tail call s...
2010 Sep 20
1
ERROR: Object not found
...} else { switch <- 0 } dP1 <- a+b*P1-switch*P1 dP2 <- a-b*P1+switch*P2 list(c(dP1,dP2,dIN)) }) } # Parameters a <- 0.1 b <- 0.2 c <- 0.5 parms <- c(a=a,b=b,c=c) # Initial conditions P10 <- 100.0 P20 <- 0.0 IN0 <- 0.0 xstart <- c(P1=P10,P2=P20,IN=IN0) # Time points times <- seq(0,10,by=1) out <- as.data.frame(rk4(xstart,times,ode,parms)) [[alternative HTML version deleted]]
2010 Sep 20
1
Ask for help with Error: Object not found
...} else { switch <- 0 } dP1 <- a+b*P1-switch*P1 dP2 <- a-b*P1+switch*P2 list(c(dP1,dP2,dIN)) }) } # Parameters a <- 0.1 b <- 0.2 c <- 0.5 parms <- c(a=a,b=b,c=c) # Initial conditions P10 <- 100.0 P20 <- 0.0 IN0 <- 0.0 xstart <- c(P1=P10,P2=P20,IN=IN0) # Time points times <- seq(0,10,by=1) out <- as.data.frame(rk4(xstart,times,ode,parms)) [[alternative HTML version deleted]]
2008 Feb 26
8
[PATCH 0/8] RFC: ia64/xen TAKE 2: paravirtualization of hand written assembly code
Hi. I rewrote the patch according to the comments. I adopted generating in-place code because it looks the quickest way. The point Eddie wanted to discuss is how to generate code and its ABI. i.e. in-place generating v.s. direct jump v.s. indirect function call Indirect function call doesn't make sense because ivt.S is compiled multi times. And it is up to pv instances to choose in-place
2008 Feb 26
8
[PATCH 0/8] RFC: ia64/xen TAKE 2: paravirtualization of hand written assembly code
Hi. I rewrote the patch according to the comments. I adopted generating in-place code because it looks the quickest way. The point Eddie wanted to discuss is how to generate code and its ABI. i.e. in-place generating v.s. direct jump v.s. indirect function call Indirect function call doesn't make sense because ivt.S is compiled multi times. And it is up to pv instances to choose in-place
2010 Oct 13
1
[LLVMdev] EXC_BAD_ACCESS: invalid MemoryBuffer from ContentCache::getBuffer
I'm using the latest llvm/clang 2.8 releases and am getting EXC_BAD_ACCESS crashes in ContentCache::getBuffer. This happens when I'm printing out errors from a compilation run and iterating over TextDiagnosticBuffer returned errors. When checking the errors, I construct a FullSourceLoc and do: int LineNum = SourceLoc.getInstantiationLineNumber(); int ColNum =