thr3ads.net - similar to: "FLAC on GPGPU"

Displaying 20 results from an estimated 1000 matches similar to: "FLAC on GPGPU"

New routine: FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_16

2013 Aug 22

New routine: FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_16

libFLAC have three SSE-accelerated functions FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_N (N = 4, 8, 12). They require lpc_order less than N. The best compression preset (flac -8) uses lpc_order up to 12; it means that during encoding FLAC also uses unaccelerated C function. I'm not very familiar with asm so I took FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_12, changed it and

patch

2004 Sep 10

patch

So here is quick patch solving the problem, now it should be PIC. -- Miroslav Lichvar lichvarm@phoenix.inf.upol.cz -------------- next part -------------- --- lpc_asm.nasm.orig Wed Jul 18 02:23:40 2001 +++ lpc_asm.nasm Sat Nov 17 21:09:46 2001 @@ -59,10 +59,10 @@ ; ALIGN 16 cident FLAC__lpc_compute_autocorrelation_asm_ia32 - ;[esp + 24] == autoc[] - ;[esp + 20] == lag - ;[esp + 16] ==

[PATCH] fix compile errors with asm disabled

2004 Oct 01

[PATCH] fix compile errors with asm disabled

The #endifs are mismatched, and my builds were failing because lpc_restore_signal* weren't getting declared. I've also commented the endifs to make them easier to match. Also, is there any reason #ifdefs for FLAC__HAS_NASM and FLAC__CPU_IA32 are separate and nested the way they are and not combined like this?: #if defined(FLAC__CPU_IA32) && defined(FLAC__HAS_NASM) I'm not

[GSoC 2016] Attaining 90% of the turbo boost peak with a C version of Matrix-Matrix Multiplication

2016 May 02

[GSoC 2016] Attaining 90% of the turbo boost peak with a C version of Matrix-Matrix Multiplication

Hi Tobias, according to [1], we can expect 90% of the turbo boost peak of the processor with a C version of Matrix-Matrix Multiplication that is similar to the one presented in [1]. In case of Intel Core i7-3820 SandyBridge, the theoretical maximal performance of the machine is 28.8 gflops and hence the expected number is 25,92 gflops. However, in case of, for example, n = m = 1056 and k = 1024

[Bug 40891] New: OpenCL: Implementing an LLVM backend for GPGPU

2011 Sep 14

[Bug 40891] New: OpenCL: Implementing an LLVM backend for GPGPU

https://bugs.freedesktop.org/show_bug.cgi?id=40891 Summary: OpenCL: Implementing an LLVM backend for GPGPU Product: xorg Version: unspecified Platform: Other OS/Version: All Status: NEW Severity: normal Priority: medium Component: Driver/nouveau AssignedTo: nouveau at lists.freedesktop.org

[LLVMdev] JOB AD: PathScale's compiler frontend/GPGPU team

2013 May 12

[LLVMdev] JOB AD: PathScale's compiler frontend/GPGPU team

== JOB POSTING == PathScale's compiler team is looking for individuals interested in GPGPU, C++, Visual Studio compatibility and compiler frontend (clang) work. Most of the work will be on our clang fork, but anyone interested to work on other parts is always welcome. (IDE, optimized math libs, debugger, compiler backend.. etc) Location: Remote (anyone who doesn't want to relocate to

[Bug 40890] New: OpenCL: Implement a GPGPU runtime following the CAL specification

2011 Sep 14

[Bug 40890] New: OpenCL: Implement a GPGPU runtime following the CAL specification

https://bugs.freedesktop.org/show_bug.cgi?id=40890 Summary: OpenCL: Implement a GPGPU runtime following the CAL specification Product: xorg Version: unspecified Platform: Other OS/Version: All Status: NEW Severity: normal Priority: medium Component: Driver/nouveau AssignedTo:

BerkeleyTIP Feb 7 Sat Global Meeting - Ekiga3, Asterisk, KDE, GPGPU, Debian Edu, GStreamer

2009 Feb 04

BerkeleyTIP Feb 7 Sat Global Meeting - Ekiga3, Asterisk, KDE, GPGPU, Debian Edu, GStreamer

** Great talks this meeting: (live & on video) ** Ekiga3, Asterisk, GPGPU, GStreamer, Debian Edu, HowTo Present KDE at meetings http://sites.google.com/site/berkeleytip/ Join from anywhere via VOIP conference, with the friendly, educational, productive, BerkeleyTIP people. :) Join the #berkeleytip freenode.net IRC channel for help getting your VOIP working.

[LLVMdev] Fwd: GSoC 2012 Proposal: Automatic GPGPU code generation for llvm

2012 Apr 04

[LLVMdev] Fwd: GSoC 2012 Proposal: Automatic GPGPU code generation for llvm

oops, forget to cc the dev-list hi tobi, > > > Yes. And instead of saving the two modules in separate files, we can store > the kernel modul as a 'string' in the host module and add the necessary > library calls to load it at run time. This will give a smooth user > experience and requires almost no additional infrastructure. We may lost some co-optimization

[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm

2012 Apr 03

[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm

Hi Justin, the non-translatable IR with GPU code replaced by appropriate CUDA Driver > API calls. One of CUDA driver apis (cuLaunch) need a ptx asm string as its input. So if I want to provide a one-touch solution and don't introduce any changes to tools outside polly, I must prepare the ptx string before I can generate the correct non-translatable IR part. As your suggestion, It may

[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm

2012 Apr 03

[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm

On Mon, Apr 2, 2012 at 7:16 AM, Yabin Hu <yabin.hwu at gmail.com> wrote: > Hi all, > > I am a phd student from Huazhong University of Sci&Tech, China. The > following is my GSoC 2012 proposal. > Comments are welcome! > > *Title: Automatic GPGPU Code Generation for LLVM* > > *Abstract* > Very often, manually developing an GPGPU application is a

[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm

2012 Apr 03

[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm

Hi Yabin, Instead of compile the LLVM IR to PTX asm string in a ScopPass, you can also the improve llc/lli or create new tools to support the code generation for Heterogeneous platforms[1], i.e. generate code for more than one target architecture at the same time. Something like this is not very complicated and had been implemented[2,3] by some people, but not available in LLVM mainstream.

[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm

2012 Apr 03

[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm

Hi Justin, 2012/4/3 Justin Holewinski <justin.holewinski at gmail.com> > *Motivation* >> With the broad proliferation of GPU computing, it is very important to >> provide an easy and automatic tool to develop or port the applications to >> GPU for normal developers, especially for those domain experts who want to >> harness the huge computing power of GPU. Polly

[CUDA/NVPTX] is inlining __syncthreads allowed?

2015 Aug 21

[CUDA/NVPTX] is inlining __syncthreads allowed?

Hi Justin, Is a compiler allowed to inline a function that calls __syncthreads? I saw nvcc does that, but not sure it's valid though. For example, void foo() { __syncthreads(); } if (threadIdx.x % 2 == 0) { ... foo(); } else { ... foo(); } Before inlining, all threads meet at one __syncthreads(). After inlining if (threadIdx.x % 2 == 0) { ... __syncthreads(); } else { ...

[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm

2012 Apr 04

[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm

On Wed, Apr 4, 2012 at 4:49 AM, Tobias Grosser <tobias at grosser.es> wrote: > On 04/03/2012 03:13 PM, Hongbin Zheng wrote: > > Hi Yabin, > > > > Instead of compile the LLVM IR to PTX asm string in a ScopPass, you > > can also the improve llc/lli or create new tools to support the code > > generation for Heterogeneous platforms[1], i.e. generate code for

compression ratio

2007 May 20

compression ratio

> yes, and much more so in the encoder. the decoder is already > very fast and approaching a fundamental limit. > > the next release of FLAC will be slightly faster encoding and > decoding. Well, I hope to have a proof-of-concept FLAC-on-CUDA to run on the latest generation of NVidia cards, some time this summer. I hope this will achieve about 10x speed improvement for both

[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm

2012 Apr 04

[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm

On 04/03/2012 03:13 PM, Hongbin Zheng wrote: > Hi Yabin, > > Instead of compile the LLVM IR to PTX asm string in a ScopPass, you > can also the improve llc/lli or create new tools to support the code > generation for Heterogeneous platforms[1], i.e. generate code for more > than one target architecture at the same time. Something like this is > not very complicated and had

Filter data

2010 Dec 02

Filter data

Hello, I understand that question is probably stupid, but ... I have data (polity IV index) "country","year","democ","autoc","polity","polity2" "1","Afghanistan ",1800,1,7,-6,-6 "2","Afghanistan ",1801,1,7,-6,-6 "3","Afghanistan

[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm

2012 Apr 02

[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm

Hi all, I am a phd student from Huazhong University of Sci&Tech, China. The following is my GSoC 2012 proposal. Comments are welcome! *Title: Automatic GPGPU Code Generation for LLVM* *Abstract* Very often, manually developing an GPGPU application is a time-consuming, complex, error-prone and iterative process. In this project, I propose to build an automatic GPGPU code generation framework

[CUDA/NVPTX] is inlining __syncthreads allowed?

2015 Aug 21

[CUDA/NVPTX] is inlining __syncthreads allowed?

I'm using 7.0. I am attaching the reduced example. nvcc sync.cu -arch=sm_35 -ptx gives // .globl _Z3foov .visible .entry _Z3foov( ) { .reg .pred %p<2>; .reg .s32 %r<3>; mov.u32 %r1, %tid.x; and.b32 %r2, %r1, 1; setp.eq.b32 %p1, %r2, 1; @!%p1 bra BB7_2; bra.uni

similar to: FLAC on GPGPU