thr3ads.net - search: "in0"

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Feb 06

2

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

...corr_QC[ order ] += silk_RSHIFT64( silk_SMULL( tmp1_QS, state_QS[ 0 ] ), 2 * QS - QC ); } in which corr_QC[0, 1, ..., order] is the only output. Suppose order = 10, and each stage of the inner loop is noted by s0, s1, ..., s9. And suppose we simultaneously process 8 input in SIMD, from in0 to in7. Let PROC(inx(sy)) denote processing input[x] at stage y. If there is no dependency between inx(sy) and in(x+1)(sy), then we can do this FOR in=0 TO N WITH in+=8 FOR y=0 TO order-1 WITH y++ PROC(in0(sy) in1(sy) in2(sy) in3(sy) in4(sy) in5(sy) in6(sy) in7(sy)) END FOR END FOR Defini...

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Feb 07

2

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

...N samples at a time, then indeed the approach you > are describing is the only solution. What I was proposing though is to > instead chop the "order" in chunks of N. Using your notation, you would > be doing: > > PROC( in0(s0)) > PROC( in0(s1) in1(s0)) > PROC( in0(s2) in1(s1) in2(s0)) > PROC( in0(s3) in1(s2) in2(s1) in3(s0)) > PROC( in0(s4) in1(s3) in2(s2) in3(s1) in4...

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Feb 07

3

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

...> > are describing is the only solution. What I was proposing though is > to > > instead chop the "order" in chunks of N. Using your notation, you > would > > be doing: > > > > PROC( in0(s0)) > > PROC( in0(s1) in1(s0)) > > PROC( in0(s2) in1(s1) in2(s0)) > > PROC( in0(s3) in1(s2) in2(s1) in3(s0)) > > PROC(...

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Feb 06

0

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

...ing chunks of the inputs N samples at a time, then indeed the approach you are describing is the only solution. What I was proposing though is to instead chop the "order" in chunks of N. Using your notation, you would be doing: PROC( in0(s0)) PROC( in0(s1) in1(s0)) PROC( in0(s2) in1(s1) in2(s0)) PROC( in0(s3) in1(s2) in2(s1) in3(s0)) PROC( in0(s4) in1(s3) in2(s2) in3(s1) in4(s0)) PROC(...

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Apr 05

2

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

...gt; are describing is the only solution. What I was proposing though >>> is to >>> > instead chop the "order" in chunks of N. Using your notation, you >>> would >>> > be doing: >>> > >>> > PROC( >>> in0(s0)) >>> > PROC( in0(s1) >>> in1(s0)) >>> > PROC( in0(s2) in1(s1) >>> in2(s0)) >>> > PROC( in0(s3) in1(s2) in2(s1) &...

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Feb 07

0

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

...ime, then indeed the approach you > are describing is the only solution. What I was proposing though is to > instead chop the "order" in chunks of N. Using your notation, you would > be doing: > > PROC( in0(s0)) > PROC( in0(s1) in1(s0)) > PROC( in0(s2) in1(s1) in2(s0)) > PROC( in0(s3) in1(s2) in2(s1) in3(s0)) > PROC( in0(s4) in1(s3) in2...

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Apr 03

0

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

...oach >> you >> > are describing is the only solution. What I was proposing though is >> to >> > instead chop the "order" in chunks of N. Using your notation, you >> would >> > be doing: >> > >> > PROC( >> in0(s0)) >> > PROC( in0(s1) >> in1(s0)) >> > PROC( in0(s2) in1(s1) >> in2(s0)) >> > PROC( in0(s3) in1(s2) in2(s1) >> in3(s0)) &gt...

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Apr 05

4

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

...proposing though is to > > > instead chop the "order" in chunks of N. Using your > > notation, you would > > > be doing: > > > > > > PROC( > > in0(s0)) > > > PROC( > > in0(s1) in1(s0)) > > > PROC( in0(s2) > > in1(s1) in2(s0)) > > > PROC( in0(s3) in1(s2) > >...

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Apr 05

0

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

...though is to > > instead chop the "order" in chunks of N. Using your > notation, you would > > be doing: > > > > PROC( > in0(s0)) > > PROC( > in0(s1) in1(s0)) > > PROC( in0(s2) > in1(s1) in2(s0)) > > PROC( in0(s...

builtins name mangling in SPIR 2.0

2016 Sep 12

2

builtins name mangling in SPIR 2.0

Hi all, According to the SPIR 2.0 spec[1], the name of OpenCL builtins are mangled. However, when I compile OpenCl code with Clang 3.9 with the "spir64-unknown-unknown" target, Clang generates IR without mangling the builtins, e.g. for: __kernel void input_zip_int(__global int *in0) { *in0 = get_global_id(0); } clang generates: define spir_kernel void @input_zip_int(i32 addrspace(1)* nocapture %in0) local_unnamed_addr #0 !kernel_arg_addr_space !3 !kernel_arg_access_qual !4 !kernel_arg_type !5 !kernel_arg_base_type !5 !kernel_arg_type_qual !6 { entry: %call = tail call s...

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Apr 06

0

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

...h is to > > > instead chop the "order" in chunks of N. Using your > > notation, you would > > > be doing: > > > > > > PROC( > > in0(s0)) > > > PROC( > > in0(s1) in1(s0)) > > > PROC( in0(s2) > > in1(s1) in2(s0)) > > > PROC( in0(s3)...

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Jan 31

6

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

Hi, Attached is a patch with arm neon optimizations for silk_warped_autocorrelation_FIX(). Please review. Thanks, Felicia -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xiph.org/pipermail/opus/attachments/20170131/9a912bb4/attachment-0001.html> -------------- next part -------------- A non-text attachment was scrubbed... Name:

builtins name mangling in SPIR 2.0

2016 Sep 12

2

builtins name mangling in SPIR 2.0

...1], the name of OpenCL builtins are mangled. > > > > However, when I compile OpenCl code with Clang 3.9 with the > "spir64-unknown-unknown" target, Clang generates IR without mangling the > builtins, e.g. for: > > > > __kernel void input_zip_int(__global int *in0) { > > *in0 = get_global_id(0); > > } > > > > clang generates: > > > > define spir_kernel void @input_zip_int(i32 addrspace(1)* nocapture %in0) > local_unnamed_addr #0 !kernel_arg_addr_space !3 !kernel_arg_access_qual !4 > !kernel_arg_type !5 !kernel_arg_...

builtins name mangling in SPIR 2.0

2016 Sep 16

2

builtins name mangling in SPIR 2.0

...IR 2.0 Hi all, According to the SPIR 2.0 spec[1], the name of OpenCL builtins are mangled. However, when I compile OpenCl code with Clang 3.9 with the "spir64-unknown-unknown" target, Clang generates IR without mangling the builtins, e.g. for: __kernel void input_zip_int(__global int *in0) { *in0 = get_global_id(0); } clang generates: define spir_kernel void @input_zip_int(i32 addrspace(1)* nocapture %in0) local_unnamed_addr #0 !kernel_arg_addr_space !3 !kernel_arg_access_qual !4 !kernel_arg_type !5 !kernel_arg_base_type !5 !kernel_arg_type_qual !6 { entry: %call = tail call s...

builtins name mangling in SPIR 2.0

2016 Sep 18

2

builtins name mangling in SPIR 2.0

...IR 2.0 Hi all, According to the SPIR 2.0 spec[1], the name of OpenCL builtins are mangled. However, when I compile OpenCl code with Clang 3.9 with the "spir64-unknown-unknown" target, Clang generates IR without mangling the builtins, e.g. for: __kernel void input_zip_int(__global int *in0) { *in0 = get_global_id(0); } clang generates: define spir_kernel void @input_zip_int(i32 addrspace(1)* nocapture %in0) local_unnamed_addr #0 !kernel_arg_addr_space !3 !kernel_arg_access_qual !4 !kernel_arg_type !5 !kernel_arg_base_type !5 !kernel_arg_type_qual !6 { entry: %call = tail call s...

ERROR: Object not found

2010 Sep 20

1

ERROR: Object not found

...} else { switch <- 0 } dP1 <- a+b*P1-switch*P1 dP2 <- a-b*P1+switch*P2 list(c(dP1,dP2,dIN)) }) } # Parameters a <- 0.1 b <- 0.2 c <- 0.5 parms <- c(a=a,b=b,c=c) # Initial conditions P10 <- 100.0 P20 <- 0.0 IN0 <- 0.0 xstart <- c(P1=P10,P2=P20,IN=IN0) # Time points times <- seq(0,10,by=1) out <- as.data.frame(rk4(xstart,times,ode,parms)) [[alternative HTML version deleted]]

Ask for help with Error: Object not found

2010 Sep 20

1

Ask for help with Error: Object not found

...} else { switch <- 0 } dP1 <- a+b*P1-switch*P1 dP2 <- a-b*P1+switch*P2 list(c(dP1,dP2,dIN)) }) } # Parameters a <- 0.1 b <- 0.2 c <- 0.5 parms <- c(a=a,b=b,c=c) # Initial conditions P10 <- 100.0 P20 <- 0.0 IN0 <- 0.0 xstart <- c(P1=P10,P2=P20,IN=IN0) # Time points times <- seq(0,10,by=1) out <- as.data.frame(rk4(xstart,times,ode,parms)) [[alternative HTML version deleted]]

[PATCH 0/8] RFC: ia64/xen TAKE 2: paravirtualization of hand written assembly code

2008 Feb 26

8

[PATCH 0/8] RFC: ia64/xen TAKE 2: paravirtualization of hand written assembly code

Hi. I rewrote the patch according to the comments. I adopted generating in-place code because it looks the quickest way. The point Eddie wanted to discuss is how to generate code and its ABI. i.e. in-place generating v.s. direct jump v.s. indirect function call Indirect function call doesn't make sense because ivt.S is compiled multi times. And it is up to pv instances to choose in-place

[PATCH 0/8] RFC: ia64/xen TAKE 2: paravirtualization of hand written assembly code

2008 Feb 26

8

[PATCH 0/8] RFC: ia64/xen TAKE 2: paravirtualization of hand written assembly code

Hi. I rewrote the patch according to the comments. I adopted generating in-place code because it looks the quickest way. The point Eddie wanted to discuss is how to generate code and its ABI. i.e. in-place generating v.s. direct jump v.s. indirect function call Indirect function call doesn't make sense because ivt.S is compiled multi times. And it is up to pv instances to choose in-place

[LLVMdev] EXC_BAD_ACCESS: invalid MemoryBuffer from ContentCache::getBuffer

2010 Oct 13

1

[LLVMdev] EXC_BAD_ACCESS: invalid MemoryBuffer from ContentCache::getBuffer

I'm using the latest llvm/clang 2.8 releases and am getting EXC_BAD_ACCESS crashes in ContentCache::getBuffer. This happens when I'm printing out errors from a compilation run and iterating over TextDiagnosticBuffer returned errors. When checking the errors, I construct a FullSourceLoc and do: int LineNum = SourceLoc.getInstantiationLineNumber(); int ColNum =

search for: in0