similar to: FLAC on GPGPU

Displaying 20 results from an estimated 1000 matches similar to: "FLAC on GPGPU"

2013 Aug 22
2
New routine: FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_16
libFLAC have three SSE-accelerated functions FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_N (N = 4, 8, 12). They require lpc_order less than N. The best compression preset (flac -8) uses lpc_order up to 12; it means that during encoding FLAC also uses unaccelerated C function. I'm not very familiar with asm so I took FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_12, changed it and
2004 Sep 10
3
patch
So here is quick patch solving the problem, now it should be PIC. -- Miroslav Lichvar lichvarm@phoenix.inf.upol.cz -------------- next part -------------- --- lpc_asm.nasm.orig Wed Jul 18 02:23:40 2001 +++ lpc_asm.nasm Sat Nov 17 21:09:46 2001 @@ -59,10 +59,10 @@ ; ALIGN 16 cident FLAC__lpc_compute_autocorrelation_asm_ia32 - ;[esp + 24] == autoc[] - ;[esp + 20] == lag - ;[esp + 16] ==
2004 Oct 01
1
[PATCH] fix compile errors with asm disabled
The #endifs are mismatched, and my builds were failing because lpc_restore_signal* weren't getting declared. I've also commented the endifs to make them easier to match. Also, is there any reason #ifdefs for FLAC__HAS_NASM and FLAC__CPU_IA32 are separate and nested the way they are and not combined like this?: #if defined(FLAC__CPU_IA32) && defined(FLAC__HAS_NASM) I'm not
2016 May 02
2
[GSoC 2016] Attaining 90% of the turbo boost peak with a C version of Matrix-Matrix Multiplication
Hi Tobias, according to [1], we can expect 90% of the turbo boost peak of the processor with a C version of Matrix-Matrix Multiplication that is similar to the one presented in [1]. In case of Intel Core i7-3820 SandyBridge, the theoretical maximal performance of the machine is 28.8 gflops and hence the expected number is 25,92 gflops. However, in case of, for example, n = m = 1056 and k = 1024
2011 Sep 14
2
[Bug 40891] New: OpenCL: Implementing an LLVM backend for GPGPU
https://bugs.freedesktop.org/show_bug.cgi?id=40891 Summary: OpenCL: Implementing an LLVM backend for GPGPU Product: xorg Version: unspecified Platform: Other OS/Version: All Status: NEW Severity: normal Priority: medium Component: Driver/nouveau AssignedTo: nouveau at lists.freedesktop.org
2013 May 12
0
[LLVMdev] JOB AD: PathScale's compiler frontend/GPGPU team
== JOB POSTING == PathScale's compiler team is looking for individuals interested in GPGPU, C++, Visual Studio compatibility and compiler frontend (clang) work. Most of the work will be on our clang fork, but anyone interested to work on other parts is always welcome. (IDE, optimized math libs, debugger, compiler backend.. etc) Location: Remote (anyone who doesn't want to relocate to
2011 Sep 14
3
[Bug 40890] New: OpenCL: Implement a GPGPU runtime following the CAL specification
https://bugs.freedesktop.org/show_bug.cgi?id=40890 Summary: OpenCL: Implement a GPGPU runtime following the CAL specification Product: xorg Version: unspecified Platform: Other OS/Version: All Status: NEW Severity: normal Priority: medium Component: Driver/nouveau AssignedTo:
2009 Feb 04
0
BerkeleyTIP Feb 7 Sat Global Meeting - Ekiga3, Asterisk, KDE, GPGPU, Debian Edu, GStreamer
** Great talks this meeting: (live & on video) ** Ekiga3, Asterisk, GPGPU, GStreamer, Debian Edu, HowTo Present KDE at meetings http://sites.google.com/site/berkeleytip/ Join from anywhere via VOIP conference, with the friendly, educational, productive, BerkeleyTIP people. :) Join the #berkeleytip freenode.net IRC channel for help getting your VOIP working.
2012 Apr 04
0
[LLVMdev] Fwd: GSoC 2012 Proposal: Automatic GPGPU code generation for llvm
oops, forget to cc the dev-list hi tobi, > > > Yes. And instead of saving the two modules in separate files, we can store > the kernel modul as a 'string' in the host module and add the necessary > library calls to load it at run time. This will give a smooth user > experience and requires almost no additional infrastructure. We may lost some co-optimization
2012 Apr 03
0
[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm
Hi Justin, the non-translatable IR with GPU code replaced by appropriate CUDA Driver > API calls. One of CUDA driver apis (cuLaunch) need a ptx asm string as its input. So if I want to provide a one-touch solution and don't introduce any changes to tools outside polly, I must prepare the ptx string before I can generate the correct non-translatable IR part. As your suggestion, It may
2012 Apr 03
0
[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm
On Mon, Apr 2, 2012 at 7:16 AM, Yabin Hu <yabin.hwu at gmail.com> wrote: > Hi all, > > I am a phd student from Huazhong University of Sci&Tech, China. The > following is my GSoC 2012 proposal. > Comments are welcome! > > *Title: Automatic GPGPU Code Generation for LLVM* > > *Abstract* > Very often, manually developing an GPGPU application is a
2012 Apr 03
0
[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm
Hi Yabin, Instead of compile the LLVM IR to PTX asm string in a ScopPass, you can also the improve llc/lli or create new tools to support the code generation for Heterogeneous platforms[1], i.e. generate code for more than one target architecture at the same time. Something like this is not very complicated and had been implemented[2,3] by some people, but not available in LLVM mainstream.
2012 Apr 03
2
[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm
Hi Justin, 2012/4/3 Justin Holewinski <justin.holewinski at gmail.com> > *Motivation* >> With the broad proliferation of GPU computing, it is very important to >> provide an easy and automatic tool to develop or port the applications to >> GPU for normal developers, especially for those domain experts who want to >> harness the huge computing power of GPU. Polly
2015 Aug 21
3
[CUDA/NVPTX] is inlining __syncthreads allowed?
Hi Justin, Is a compiler allowed to inline a function that calls __syncthreads? I saw nvcc does that, but not sure it's valid though. For example, void foo() { __syncthreads(); } if (threadIdx.x % 2 == 0) { ... foo(); } else { ... foo(); } Before inlining, all threads meet at one __syncthreads(). After inlining if (threadIdx.x % 2 == 0) { ... __syncthreads(); } else { ...
2012 Apr 04
0
[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm
On Wed, Apr 4, 2012 at 4:49 AM, Tobias Grosser <tobias at grosser.es> wrote: > On 04/03/2012 03:13 PM, Hongbin Zheng wrote: > > Hi Yabin, > > > > Instead of compile the LLVM IR to PTX asm string in a ScopPass, you > > can also the improve llc/lli or create new tools to support the code > > generation for Heterogeneous platforms[1], i.e. generate code for
2007 May 20
1
compression ratio
> yes, and much more so in the encoder. the decoder is already > very fast and approaching a fundamental limit. > > the next release of FLAC will be slightly faster encoding and > decoding. Well, I hope to have a proof-of-concept FLAC-on-CUDA to run on the latest generation of NVidia cards, some time this summer. I hope this will achieve about 10x speed improvement for both
2012 Apr 04
3
[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm
On 04/03/2012 03:13 PM, Hongbin Zheng wrote: > Hi Yabin, > > Instead of compile the LLVM IR to PTX asm string in a ScopPass, you > can also the improve llc/lli or create new tools to support the code > generation for Heterogeneous platforms[1], i.e. generate code for more > than one target architecture at the same time. Something like this is > not very complicated and had
2010 Dec 02
6
Filter data
Hello, I understand that question is probably stupid, but ... I have data (polity IV index) "country","year","democ","autoc","polity","polity2" "1","Afghanistan ",1800,1,7,-6,-6 "2","Afghanistan ",1801,1,7,-6,-6 "3","Afghanistan
2012 Apr 02
6
[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm
Hi all, I am a phd student from Huazhong University of Sci&Tech, China. The following is my GSoC 2012 proposal. Comments are welcome! *Title: Automatic GPGPU Code Generation for LLVM* *Abstract* Very often, manually developing an GPGPU application is a time-consuming, complex, error-prone and iterative process. In this project, I propose to build an automatic GPGPU code generation framework
2015 Aug 21
2
[CUDA/NVPTX] is inlining __syncthreads allowed?
I'm using 7.0. I am attaching the reduced example. nvcc sync.cu -arch=sm_35 -ptx gives // .globl _Z3foov .visible .entry _Z3foov( ) { .reg .pred %p<2>; .reg .s32 %r<3>; mov.u32 %r1, %tid.x; and.b32 %r2, %r1, 1; setp.eq.b32 %p1, %r2, 1; @!%p1 bra BB7_2; bra.uni