search for: rvdg

Displaying 6 results from an estimated 6 matches for "rvdg".

Did you mean: rdg
2016 May 28
1
Determination of statements that contain only matrix multiplication
...risingly, the BLIS implementation does not attempt at anticipating the fetch. It schedules the prefetch instruction right before the first load of a given interval. > Refs: > > [1] - http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf > [2] - http://wiki.cs.utexas.edu/rvdg/HowToOptimizeGemm > [3] - https://github.com/flame/blis/blob/master/kernels/x86_64/sandybridge/3/bli_gemm_int_d8x4.c >
2016 May 20
0
Determination of statements that contain only matrix multiplication
...try to make sure that micro-panel Br is loaded after micro-panel Ar (as required in [1] p. 11). For example, its using helps to reduce the execution time of the attached implementation. Refs: [1] - http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf [2] - http://wiki.cs.utexas.edu/rvdg/HowToOptimizeGemm [3] - https://github.com/flame/blis/blob/master/kernels/x86_64/sandybridge/3/bli_gemm_int_d8x4.c -- Cheers, Roman Gareev. -------------- next part -------------- A non-text attachment was scrubbed... Name: gemm_C_SIMD.c Type: text/x-csrc Size:...
2016 May 17
4
Determination of statements that contain only matrix multiplication
On 05/17/2016 01:47 PM, Michael Kruse wrote: > 2016-05-16 19:52 GMT+02:00 Roman Gareev <gareevroman at gmail.com>: >> Hi Tobias, >> >> could we use information about memory accesses of a SCoP statement and >> def-use chains to determine statements, which don’t contain matrix >> multiplication of the following form? > > Assuming s/don't/do you want
2002 Dec 20
0
new optimized BLAS
Dear R-help, Here's a posting to the most recent NA-digest: From: Robert van de Geijn <rvdg at cs.utexas.edu> Date: Fri, 13 Dec 2002 11:15:23 -0600 Subject: Fast BLAS Libraries for Current Architectures Recent research by Kazushige Goto, Visiting Scientist at UT-Austin, has resulted in high-performance BLAS libraries for the Intel (R) Pentium (R) III and 4 processors, the HP/Compaq/DE...
2003 Oct 26
1
FLAME
...Given A->QR , / A \ -> / Q1 \ R1 \ B / \ Q2 / requires a complete new factorization, which is costly. ==================== I guess R-core can perhaps help getting more statistical computation into FLAME by answering these questions. I can forward answers, or you can directly contact rvdg@cs.utexas.edu === Jan de Leeuw; Professor and Chair, UCLA Department of Statistics; Editor: Journal of Multivariate Analysis, Journal of Statistical Software US mail: 8130 Math Sciences Bldg, Box 951554, Los Angeles, CA 90095-1554 phone (310)-825-9550; fax (310)-206-5658; email: deleeuw@stat.u...
2016 May 02
2
[GSoC 2016] Attaining 90% of the turbo boost peak with a C version of Matrix-Matrix Multiplication
...[5]. That’s why we would probably get more than 0.088919 seconds mentioned above, if the multithreading were disabled (I’ve been using export OMP_THREAD_LIMIT=1 to limit the number of OMP threads. However, I haven’t found a way to avoid usual multithreading). Refs. [1] - http://wiki.cs.utexas.edu/rvdg/HowToOptimizeGemm [2] - http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf [3] - https://github.com/flame/blis/tree/master/kernels/x86_64/sandybridge/3 [4] - https://github.com/flame/blis/blob/master/kernels/x86_64/sandybridge/3/bli_gemm_int_d8x4.c [5] - https://github.com/flame/bli...