thr3ads.net - similar to: "builtins name mangling in SPIR 2.0"

Displaying 20 results from an estimated 800 matches similar to: "builtins name mangling in SPIR 2.0"

2016 Sep 12

builtins name mangling in SPIR 2.0

Thanks a lot. On Mon, Sep 12, 2016 at 1:42 PM, Liu, Yaxun (Sam) <Yaxun.Liu at amd.com> wrote: > If you use the default header file under clang/lib/Headers/opencl-c.h, > get_global_id will be mangled. > > > > If you want to declare get_global_id in your own header, add > __attribute__((overloadable)), then it will be mangled. > > > > Sam > > > >

builtins name mangling in SPIR 2.0

2016 Sep 16

builtins name mangling in SPIR 2.0

+ Alexey Anastasia According to SPIR spec v1.2 s2.10.3 2.10.3 The printf function The printf function is supported, and is mangled according to its prototype as follows: int printf(constant char * restrict fmt, ... ) Note that the ellipsis formal argument (...) is mangled to argument type specifier z It seems printf should be mangled. Alexey/Anastasia, What do you think? Thanks. Sam From:

builtins name mangling in SPIR 2.0

2016 Sep 18

builtins name mangling in SPIR 2.0

I don't see any problem mangling it to be honest even though there seems to be only one prototype anyways. We could add restrict in as well. Cheers, Anastasia ________________________________ From: Hongbin Zheng <etherzhhb at gmail.com> Sent: 17 September 2016 05:32:54 To: Liu, Yaxun (Sam) Cc: cfe-dev at lists.llvm.org; llvm-dev; Bader, Alexey (alexey.bader at intel.com); Anastasia

Some llvm questions (for tgsi backend)

2016 Jan 11

Some llvm questions (for tgsi backend)

Hi, After a few distractions I'm back to work on the llvm tgsi backend. I've added clang integration and I can now compile a simple opencl program to something which sort of looks like tgsi. You can find my latest work on this here: http://cgit.freedesktop.org/~jwrdegoede/llvm http://cgit.freedesktop.org/~jwrdegoede/clang (the latter may still need to sync) I've a little test

Some llvm questions (for tgsi backend)

2016 Jan 12

Some llvm questions (for tgsi backend)

Hi Tom, Thanks for taking the time to answer this. On 11-01-16 18:10, Tom Stellard wrote: > On Mon, Jan 11, 2016 at 12:07:14PM +0100, Hans de Goede wrote: >> Hi, >> >> After a few distractions I'm back to work on the llvm tgsi backend. I've >> added clang integration and I can now compile a simple opencl program >> to something which sort of looks like

BitcodeReader non explicit error

2016 May 24

BitcodeReader non explicit error

Hi, I'm working on OpenCL and I'm using clang as compiler (based on clang 3.7.0). I have a issue, I'm generating a bitcode file (that I can print before before the generation). But when I'm trying to read it again with clang, I have this issue: "error: Invalid record" How can I managed to know where it comes from? Thank you, Romaric Here is what is print before the

Some llvm questions (for tgsi backend)

2016 Jan 11

Some llvm questions (for tgsi backend)

On Mon, Jan 11, 2016 at 12:07:14PM +0100, Hans de Goede wrote: > Hi, > > After a few distractions I'm back to work on the llvm tgsi backend. I've > added clang integration and I can now compile a simple opencl program > to something which sort of looks like tgsi. > > You can find my latest work on this here: > http://cgit.freedesktop.org/~jwrdegoede/llvm >

Some llvm questions (for tgsi backend)

2016 Jan 11

Some llvm questions (for tgsi backend)

On Mon, Jan 11, 2016 at 6:07 AM, Hans de Goede <hdegoede at redhat.com> wrote: > Hi, > > After a few distractions I'm back to work on the llvm tgsi backend. I've > added clang integration and I can now compile a simple opencl program > to something which sort of looks like tgsi. > > You can find my latest work on this here: >

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Feb 06

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

Hi Jean-Marc, Thanks a lot for reviewing this huge assembly function! silk_warped_autocorrelation_FIX_c()'s kernel part is for( n = 0; n < length; n++ ) { tmp1_QS = silk_LSHIFT32( (opus_int32)input[ n ], QS ); /* Loop over allpass sections */ for( i = 0; i < order; i++ ) { /* Output of allpass section */ tmp2_QS = silk_SMLAWB(

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Feb 07

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

This is a great idea. But the order (psEncC->shapingLPCOrder) can be configured to 12, 14, 16, 20 and 24 according to complexity parameter. It's hard to get a universal function to handle all these orders efficiently. Any suggestions? Thanks, Linfeng On Mon, Feb 6, 2017 at 12:40 PM, Jean-Marc Valin <jmvalin at jmvalin.ca> wrote: > Hi Linfeng, > > On 06/02/17 02:51 PM,

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Feb 07

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

Hi Jean-Marc, Thanks for your suggestions. Will get back to you once we have some updates. Linfeng On Mon, Feb 6, 2017 at 5:47 PM, Jean-Marc Valin <jmvalin at jmvalin.ca> wrote: > Hi Linfeng, > > On 06/02/17 07:18 PM, Linfeng Zhang wrote: > > This is a great idea. But the order (psEncC->shapingLPCOrder) can be > > configured to 12, 14, 16, 20 and 24 according to

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Apr 05

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

I attached a new patch with small cleanup (disassembly is identical as the last patch). We have done the same internal testing as usual. Also, attached 2 failed temporary versions which try to reduce code size (just for code review reference purpose). The new patch of silk_warped_autocorrelation_FIX_neon() has a code size of 3,228 bytes (with gcc). smaller_slower.c has a code size of 2,304

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Apr 05

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

Thank Jean-Marc! The speedup percentages are all relative to the entire encoder. Comparing to master, this optimization patch speeds up fixed-point SILK encoder on NEON as following: Complexity 5: 6.1% Complexity 6: 5.8% Complexity 8: 5.5% Complexity 10: 4.0% when testing on an Acer Chromebook, ARMv7 Processor rev 3 (v7l), CPU max MHz: 2116.5 Thanks, Linfeng On Wed, Apr 5, 2017 at 11:02 AM,

ERROR: Object not found

2010 Sep 20

ERROR: Object not found

Dear All, I am trying to use ode solver "rk4" to solve an ODE system, however, it keeps saying: Error in eval(expr, envir, enclos) : object "dIN" not found. The sample codes are enclosed as follows, please help me. Thank you very much! rm(list=ls()) library(odesolve) # The ODE system ode <- function(t,x,p){ with(as.list(c(x,p)),{

Ask for help with Error: Object not found

2010 Sep 20

Ask for help with Error: Object not found

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Feb 06

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

Hi Linfeng, On 06/02/17 02:51 PM, Linfeng Zhang wrote: > However, the critical thing is that all the states in each stage when > processing input[i] are reused by the next input[i+1]. That is > input[i+1] must wait input[i] for 1 stage, and input[i+2] must wait > input[i+1] for 1 stage, etc. That is indeed the tricky part... and the one I think you could do slightly differently. If

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Feb 07

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

Hi Linfeng, On 06/02/17 07:18 PM, Linfeng Zhang wrote: > This is a great idea. But the order (psEncC->shapingLPCOrder) can be > configured to 12, 14, 16, 20 and 24 according to complexity parameter. > > It's hard to get a universal function to handle all these orders > efficiently. Any suggestions? I can think of two ways of handling larger orders. The obvious one is

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Apr 03

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

Hi Jean-Marc, Attached is the silk_warped_autocorrelation_FIX_neon() which implements your idea. Speed improvement vs the previous optimization: Complexity 0-4: Doesn't call this function. Complexity 5: 2.1% (order = 16) Complexity 6: 1.0% (order = 20) Complexity 8: 0.1% (order = 24) Complexity 10: 0.1% (order = 24) Code size of silk_warped_autocorrelation_FIX_neon() changes from 2,644

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Apr 05

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

Hi Linfeng, Thanks for the updated patch. I'll have a look and get back to you. When you report speedup percentages, is that relative to the entire encoder or relative to just that function in C? Also, what's the speedup compared to master? Cheers, Jean-Marc On 05/04/17 12:14 PM, Linfeng Zhang wrote: > I attached a new patch with small cleanup (disassembly is identical as > the

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Apr 06

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

Hi Linfeng, I had a closer look at your patch and the code looks good -- and slightly simpler than I had anticipated, so that's good. I did some profiling on a Cortex A57 and I've been seeing slightly less improvement than you're reporting, more like 3.5% at complexity 8. It appears that the warped autocorrelation function itself is only faster by a factor of about 1.35. That's a

similar to: builtins name mangling in SPIR 2.0