thr3ads.net - search: "gflop"

Displaying 10 results from an estimated 10 matches for "gflop".

Did you mean: flop

[LLVMdev] LLVM x86 Code Generator discards Instruction-level Parallelism

2010 Nov 03

[LLVMdev] LLVM x86 Code Generator discards Instruction-level Parallelism

...endencies cannot issue immediately. The benchmark was intentionally written to avoid this hazard but LLVM's code generator seems to ignore that when it schedules instructions. When I run this benchmark on my 2.83 GHz CPU, I observe the following performance results: 1 threads 0.648891 GFLOP/s 2 threads 1.489049 GFLOP/s 3 threads 2.209838 GFLOP/s 4 threads 2.940443 GFLOP/s When I rewrite the generated assembly by hand to exhibit the same interleaving as in the LLVM IR form . . mulss %xmm8, %xmm10 mulss %xmm7, %xmm9 mulss %xmm6,...

[GSoC 2016] Attaining 90% of the turbo boost peak with a C version of Matrix-Matrix Multiplication

2016 May 02

[GSoC 2016] Attaining 90% of the turbo boost peak with a C version of Matrix-Matrix Multiplication

Hi Tobias, according to [1], we can expect 90% of the turbo boost peak of the processor with a C version of Matrix-Matrix Multiplication that is similar to the one presented in [1]. In case of Intel Core i7-3820 SandyBridge, the theoretical maximal performance of the machine is 28.8 gflops and hence the expected number is 25,92 gflops. However, in case of, for example, n = m = 1056 and k = 1024 a code based on BLIS framework takes 0.088919 seconds and hence 25,68 gflops. I’m not sure whether a C implementation, which similar to one the presented in [1], can outperform a code based...

lustre client on arm debian

2012 Oct 31

lustre client on arm debian

Hi, has anyone tried to compile the lustre patchless client on a debian linux for arm architecture? Could be possible to do? Thanks in advance.

FLAC on GPGPU

2007 May 02

FLAC on GPGPU

...block. In addition to this kind of parallelism, grids of blocks (which do not share memory, unlike threads within the same block) can be used to process several audio frames at once. This is somewhat tricky, given the explicitly stream-oriented API, and also some CUDA peculiarities. With the 330 GFLOPS from the current cards... I'd expect quite a significant acceleration. Does anyone find this interesting? Josh: do you think this would be worth including in the FLAC codebase when implemented? -- boris

Quadrified GTX 480 VT-d passthrough. CUDA 5.5 in Linux partial success

2013 Nov 19

Quadrified GTX 480 VT-d passthrough. CUDA 5.5 in Linux partial success

...what CUDA app I''m running, here is matrixMul for example: matrixMul# ./matrixMul [Matrix Multiply Using CUDA] - Starting... GPU Device 0: "Quadro 6000" with compute capability 2.0 MatrixA(320,320), MatrixB(640,320) Computing result using CUDA Kernel... done Performance= 227.22 GFlop/s, Time= 0.577 msec, Size= 131072000 Ops, WorkgroupSize= 1024 threads/block Checking computed result for correctness: Result = PASS Note: For peak performance, please refer to the matrixMulCUBLAS example. Anyhoo, does anyone have any idea what might I be able to tweak so I can avoid this issue? T...

Quadrified GTX 480 VT-d passthrough. CUDA 5.5 in Linux partial success!

2013 Nov 18

Quadrified GTX 480 VT-d passthrough. CUDA 5.5 in Linux partial success!

...r what CUDA app I''m running, here is matrixMul for example: matrixMul# ./matrixMul [Matrix Multiply Using CUDA] - Starting... GPU Device 0: "Quadro 6000" with compute capability 2.0 MatrixA(320,320), MatrixB(640,320) Computing result using CUDA Kernel... done Performance= 227.22 GFlop/s, Time= 0.577 msec, Size= 131072000 Ops, WorkgroupSize= 1024 threads/block Checking computed result for correctness: Result = PASS Note: For peak performance, please refer to the matrixMulCUBLAS example. Anyhoo, does anyone have any idea what might I be able to tweak of avoiding this issue? The...

Design changes are done in Fedora

2015 Jan 07

Design changes are done in Fedora

...tinuum is a smartphone the other end of, exactly? The one in my pocket has multiple general-purpose GHz-class CPU cores, a few specialized coprocessors, several hundred megs of RAM, dozens of gigs of fast local storage, and several high-tech radios. Its raw processing power is on the order of 100 GFLOPS. This is the low end of?what?the Top 500 List from 1998? ?except that my phone achieves that parity on a few watts, and doesn?t require a staff of acolytes to tend to its needs. This is a device that would make Captain Kirk jealous, but it?s just one of a billion. Booooring. We are *so* spoi...

Design changes are done in Fedora

2015 Jan 07

Design changes are done in Fedora

On Tue, Jan 6, 2015 at 5:07 PM, Warren Young <wyml at etr-usa.com> wrote: > >>> There are more JavaScript interpreters in the world than Dalvik, ART,[2] and Java ? VMs combined. Perhaps we should rewrite everything in JavaScript instead? >> >> I'm counting the running/useful instances of actual program code, > > I rather doubt you?ve done anything like

stupid ZFS question - floating point operations

2010 Dec 22

stupid ZFS question - floating point operations

I have a coworker, who''s primary expertise is in another flavor of Unix. This coworker lists floating point operations as one of ZFS detriments. I''s not really sure what he means specifically, or where he got this reference from. In an effort to refute what I believe is an error or misunderstanding on his part, I have spent time on Yahoo, Google, the ZFS section of

zfs list improvements?

2009 Jan 06

zfs list improvements?

To improve the performance of scripts that manipulate zfs snapshots and the zfs snapshot service in perticular there needs to be a way to list all the snapshots for a given object and only the snapshots for that object. There are two RFEs filed that cover this: http://bugs.opensolaris.org/view_bug.do?bug_id=6352014 : ''zfs list'' should have an option to only present direct

search for: gflop