Displaying 6 results from an estimated 6 matches for "rvdg".
Did you mean:
rdg
2016 May 28
1
Determination of statements that contain only matrix multiplication
...risingly, the
BLIS implementation does not attempt at anticipating the fetch. It
schedules the prefetch instruction right before the first load of a
given interval.
> Refs:
>
> [1] - http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf
> [2] - http://wiki.cs.utexas.edu/rvdg/HowToOptimizeGemm
> [3] - https://github.com/flame/blis/blob/master/kernels/x86_64/sandybridge/3/bli_gemm_int_d8x4.c
>
2016 May 20
0
Determination of statements that contain only matrix multiplication
...try to make sure that micro-panel Br is
loaded after micro-panel Ar (as required in [1] p. 11). For example,
its using helps to reduce the execution time of the attached
implementation.
Refs:
[1] - http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf
[2] - http://wiki.cs.utexas.edu/rvdg/HowToOptimizeGemm
[3] - https://github.com/flame/blis/blob/master/kernels/x86_64/sandybridge/3/bli_gemm_int_d8x4.c
--
Cheers, Roman Gareev.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gemm_C_SIMD.c
Type: text/x-csrc
Size:...
2016 May 17
4
Determination of statements that contain only matrix multiplication
On 05/17/2016 01:47 PM, Michael Kruse wrote:
> 2016-05-16 19:52 GMT+02:00 Roman Gareev <gareevroman at gmail.com>:
>> Hi Tobias,
>>
>> could we use information about memory accesses of a SCoP statement and
>> def-use chains to determine statements, which don’t contain matrix
>> multiplication of the following form?
>
> Assuming s/don't/do you want
2002 Dec 20
0
new optimized BLAS
Dear R-help,
Here's a posting to the most recent NA-digest:
From: Robert van de Geijn <rvdg at cs.utexas.edu>
Date: Fri, 13 Dec 2002 11:15:23 -0600
Subject: Fast BLAS Libraries for Current Architectures
Recent research by Kazushige Goto, Visiting Scientist at UT-Austin,
has resulted in high-performance BLAS libraries for the Intel (R)
Pentium (R) III and 4 processors, the HP/Compaq/DE...
2003 Oct 26
1
FLAME
...Given A->QR ,
/ A \ -> / Q1 \ R1
\ B / \ Q2 /
requires a complete new factorization, which is costly.
====================
I guess R-core can perhaps help getting more statistical computation
into FLAME
by answering these questions. I can forward answers, or you can directly
contact rvdg@cs.utexas.edu
===
Jan de Leeuw; Professor and Chair, UCLA Department of Statistics;
Editor: Journal of Multivariate Analysis, Journal of Statistical
Software
US mail: 8130 Math Sciences Bldg, Box 951554, Los Angeles, CA 90095-1554
phone (310)-825-9550; fax (310)-206-5658; email: deleeuw@stat.u...
2016 May 02
2
[GSoC 2016] Attaining 90% of the turbo boost peak with a C version of Matrix-Matrix Multiplication
...[5]. That’s why we would probably get more than
0.088919 seconds mentioned above, if the multithreading were disabled
(I’ve been using export OMP_THREAD_LIMIT=1 to limit the number of OMP
threads. However, I haven’t found a way to avoid usual
multithreading).
Refs.
[1] - http://wiki.cs.utexas.edu/rvdg/HowToOptimizeGemm
[2] - http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf
[3] - https://github.com/flame/blis/tree/master/kernels/x86_64/sandybridge/3
[4] - https://github.com/flame/blis/blob/master/kernels/x86_64/sandybridge/3/bli_gemm_int_d8x4.c
[5] - https://github.com/flame/bli...