Displaying 20 results from an estimated 1000 matches similar to: "FLAC on GPGPU"
2013 Aug 22
2
New routine: FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_16
libFLAC have three SSE-accelerated functions FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_N (N = 4, 8, 12). They require lpc_order less than N.
The best compression preset (flac -8) uses lpc_order up to 12; it means that during encoding FLAC also uses unaccelerated C function.
I'm not very familiar with asm so I took FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_12, changed it and
2004 Sep 10
3
patch
So here is quick patch solving the problem, now it should be PIC.
--
Miroslav Lichvar
lichvarm@phoenix.inf.upol.cz
-------------- next part --------------
--- lpc_asm.nasm.orig Wed Jul 18 02:23:40 2001
+++ lpc_asm.nasm Sat Nov 17 21:09:46 2001
@@ -59,10 +59,10 @@
;
ALIGN 16
cident FLAC__lpc_compute_autocorrelation_asm_ia32
- ;[esp + 24] == autoc[]
- ;[esp + 20] == lag
- ;[esp + 16] ==
2004 Oct 01
1
[PATCH] fix compile errors with asm disabled
The #endifs are mismatched, and my builds were failing because
lpc_restore_signal* weren't getting declared.
I've also commented the endifs to make them easier to match.
Also, is there any reason #ifdefs for FLAC__HAS_NASM and FLAC__CPU_IA32 are
separate and nested the way they are and not combined like this?:
#if defined(FLAC__CPU_IA32) && defined(FLAC__HAS_NASM)
I'm not
2016 May 02
2
[GSoC 2016] Attaining 90% of the turbo boost peak with a C version of Matrix-Matrix Multiplication
Hi Tobias,
according to [1], we can expect 90% of the turbo boost peak of the
processor with a C version of Matrix-Matrix Multiplication that is
similar to the one presented in [1]. In case of Intel Core i7-3820
SandyBridge, the theoretical maximal performance of the machine is
28.8 gflops and hence the expected number is 25,92 gflops.
However, in case of, for example, n = m = 1056 and k = 1024
2011 Sep 14
2
[Bug 40891] New: OpenCL: Implementing an LLVM backend for GPGPU
https://bugs.freedesktop.org/show_bug.cgi?id=40891
Summary: OpenCL: Implementing an LLVM backend for GPGPU
Product: xorg
Version: unspecified
Platform: Other
OS/Version: All
Status: NEW
Severity: normal
Priority: medium
Component: Driver/nouveau
AssignedTo: nouveau at lists.freedesktop.org
2013 May 12
0
[LLVMdev] JOB AD: PathScale's compiler frontend/GPGPU team
== JOB POSTING ==
PathScale's compiler team is looking for individuals interested in
GPGPU, C++, Visual Studio compatibility and compiler frontend (clang) work.
Most of the work will be on our clang fork, but anyone interested to
work on other parts is always welcome. (IDE, optimized math libs,
debugger, compiler backend.. etc)
Location: Remote (anyone who doesn't want to relocate to
2011 Sep 14
3
[Bug 40890] New: OpenCL: Implement a GPGPU runtime following the CAL specification
https://bugs.freedesktop.org/show_bug.cgi?id=40890
Summary: OpenCL: Implement a GPGPU runtime following the CAL
specification
Product: xorg
Version: unspecified
Platform: Other
OS/Version: All
Status: NEW
Severity: normal
Priority: medium
Component: Driver/nouveau
AssignedTo:
2009 Feb 04
0
BerkeleyTIP Feb 7 Sat Global Meeting - Ekiga3, Asterisk, KDE, GPGPU, Debian Edu, GStreamer
** Great talks this meeting: (live & on video) **
Ekiga3, Asterisk, GPGPU, GStreamer, Debian Edu,
HowTo Present KDE at meetings
http://sites.google.com/site/berkeleytip/
Join from anywhere via VOIP conference,
with the friendly, educational, productive, BerkeleyTIP people. :)
Join the #berkeleytip freenode.net IRC channel for help getting your
VOIP working.
2012 Apr 04
0
[LLVMdev] Fwd: GSoC 2012 Proposal: Automatic GPGPU code generation for llvm
oops, forget to cc the dev-list
hi tobi,
>
>
> Yes. And instead of saving the two modules in separate files, we can store
> the kernel modul as a 'string' in the host module and add the necessary
> library calls to load it at run time. This will give a smooth user
> experience and requires almost no additional infrastructure.
We may lost some co-optimization
2012 Apr 03
0
[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm
Hi Justin,
the non-translatable IR with GPU code replaced by appropriate CUDA Driver
> API calls.
One of CUDA driver apis (cuLaunch) need a ptx asm string as its input. So
if I want to provide a one-touch solution and don't introduce any changes
to tools outside polly, I must prepare the ptx string before I can generate
the correct non-translatable IR part.
As your suggestion, It may
2012 Apr 03
0
[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm
On Mon, Apr 2, 2012 at 7:16 AM, Yabin Hu <yabin.hwu at gmail.com> wrote:
> Hi all,
>
> I am a phd student from Huazhong University of Sci&Tech, China. The
> following is my GSoC 2012 proposal.
> Comments are welcome!
>
> *Title: Automatic GPGPU Code Generation for LLVM*
>
> *Abstract*
> Very often, manually developing an GPGPU application is a
2012 Apr 03
0
[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm
Hi Yabin,
Instead of compile the LLVM IR to PTX asm string in a ScopPass, you
can also the improve llc/lli or create new tools to support the code
generation for Heterogeneous platforms[1], i.e. generate code for more
than one target architecture at the same time. Something like this is
not very complicated and had been implemented[2,3] by some people, but
not available in LLVM mainstream.
2012 Apr 03
2
[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm
Hi Justin,
2012/4/3 Justin Holewinski <justin.holewinski at gmail.com>
> *Motivation*
>> With the broad proliferation of GPU computing, it is very important to
>> provide an easy and automatic tool to develop or port the applications to
>> GPU for normal developers, especially for those domain experts who want to
>> harness the huge computing power of GPU. Polly
2015 Aug 21
3
[CUDA/NVPTX] is inlining __syncthreads allowed?
Hi Justin,
Is a compiler allowed to inline a function that calls __syncthreads? I saw
nvcc does that, but not sure it's valid though. For example,
void foo() {
__syncthreads();
}
if (threadIdx.x % 2 == 0) {
...
foo();
} else {
...
foo();
}
Before inlining, all threads meet at one __syncthreads(). After inlining
if (threadIdx.x % 2 == 0) {
...
__syncthreads();
} else {
...
2012 Apr 04
0
[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm
On Wed, Apr 4, 2012 at 4:49 AM, Tobias Grosser <tobias at grosser.es> wrote:
> On 04/03/2012 03:13 PM, Hongbin Zheng wrote:
> > Hi Yabin,
> >
> > Instead of compile the LLVM IR to PTX asm string in a ScopPass, you
> > can also the improve llc/lli or create new tools to support the code
> > generation for Heterogeneous platforms[1], i.e. generate code for
2007 May 20
1
compression ratio
> yes, and much more so in the encoder. the decoder is already
> very fast and approaching a fundamental limit.
>
> the next release of FLAC will be slightly faster encoding and
> decoding.
Well, I hope to have a proof-of-concept FLAC-on-CUDA to run on the
latest generation of NVidia cards, some time this summer. I hope
this will achieve about 10x speed improvement for both
2012 Apr 04
3
[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm
On 04/03/2012 03:13 PM, Hongbin Zheng wrote:
> Hi Yabin,
>
> Instead of compile the LLVM IR to PTX asm string in a ScopPass, you
> can also the improve llc/lli or create new tools to support the code
> generation for Heterogeneous platforms[1], i.e. generate code for more
> than one target architecture at the same time. Something like this is
> not very complicated and had
2010 Dec 02
6
Filter data
Hello,
I understand that question is probably stupid, but ...
I have data (polity IV index)
"country","year","democ","autoc","polity","polity2"
"1","Afghanistan ",1800,1,7,-6,-6
"2","Afghanistan ",1801,1,7,-6,-6
"3","Afghanistan
2012 Apr 02
6
[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm
Hi all,
I am a phd student from Huazhong University of Sci&Tech, China. The
following is my GSoC 2012 proposal.
Comments are welcome!
*Title: Automatic GPGPU Code Generation for LLVM*
*Abstract*
Very often, manually developing an GPGPU application is a time-consuming,
complex, error-prone and iterative process. In this project, I propose to
build an automatic GPGPU code generation framework
2015 Aug 21
2
[CUDA/NVPTX] is inlining __syncthreads allowed?
I'm using 7.0. I am attaching the reduced example.
nvcc sync.cu -arch=sm_35 -ptx
gives
// .globl _Z3foov
.visible .entry _Z3foov(
)
{
.reg .pred %p<2>;
.reg .s32 %r<3>;
mov.u32 %r1, %tid.x;
and.b32 %r2, %r1, 1;
setp.eq.b32 %p1, %r2, 1;
@!%p1 bra BB7_2;
bra.uni