search for: blis

Displaying 20 results from an estimated 22 matches for "blis".

Did you mean: blas
2016 May 02
2
[GSoC 2016] Attaining 90% of the turbo boost peak with a C version of Matrix-Matrix Multiplication
...Matrix Multiplication that is similar to the one presented in [1]. In case of Intel Core i7-3820 SandyBridge, the theoretical maximal performance of the machine is 28.8 gflops and hence the expected number is 25,92 gflops. However, in case of, for example, n = m = 1056 and k = 1024 a code based on BLIS framework takes 0.088919 seconds and hence 25,68 gflops. I’m not sure whether a C implementation, which similar to one the presented in [1], can outperform a code based on BLIS framework. What do you think about it? What performance should we necessarily reach? In case of, for example, n = m = 1...
2016 May 28
1
Determination of statements that contain only matrix multiplication
...s :-) I'm not sure I followed exactly what you wanted to say, but I understand that this is not the priority since you can get 90% of the performance without worrying about prefetching. > I started to consider prefetching, because it’s used in > implementations of gemm micro-kernels of BLIS framework [3]. If I’m > not mistaken, it’s applied to try to make sure that micro-panel Br is > loaded after micro-panel Ar (as required in [1] p. 11). For example, > its using helps to reduce the execution time of the attached > implementation. Interesting. The BLIS implementation pre...
2016 May 20
0
Determination of statements that contain only matrix multiplication
...6-05-19 21:45 GMT+05:00 4lbert C0hen <4lbert.h.c0hen at gmail.com>: > One short note. I would advise against spending time on prefetching for x86. > Recent hardware prefetchers are amazingly good at strided accesses in > single-threaded code. Caution: this is not based on objective/published > data, but on personal experience. > > There are open challenges in multiprocessor prefecthing, even for regularly > strided data, but these are probabably too ambitious to be tackled > effectively in the time frame of a SoC. There are lots of papers on this > however. > &g...
2016 May 17
4
Determination of statements that contain only matrix multiplication
On 05/17/2016 01:47 PM, Michael Kruse wrote: > 2016-05-16 19:52 GMT+02:00 Roman Gareev <gareevroman at gmail.com>: >> Hi Tobias, >> >> could we use information about memory accesses of a SCoP statement and >> def-use chains to determine statements, which don’t contain matrix >> multiplication of the following form? > > Assuming s/don't/do you want
2020 May 27
2
Changing the BLAS from openblas on a F32 box
...pe for disaster for the large GAMs we're trying to > fit. But being able to switch to atlas temporarily is a good > alternative. Note that switching to openblas-openmp (libopenblaso.so) should be thread-safe and will probably get you a better performance than Atlas. Also, Fedora packages blis (which provides /lib64/blisblas/libblas.so.3). It seems to be thread-safe should be more performant than Atlas too. -- I?aki ?car
2004 Jun 11
4
Bug#253861: logcheck: Please add support for imapproxy
Package: logcheck Version: 1.2.22a Severity: wishlist There is no support for imapproxy, and it would be a great help if it was added. Following are two sample lines from the syslog: Jun 11 09:36:55 MyHost in.imapproxyd[30845]: LOGOUT: '"MyUser"' from server sd [13] Jun 11 09:37:02 MyHost in.imapproxyd[30846]: LOGIN: '"MyUser"' (xxx.xxx.xxx.xx:yyyyy) on
2020 May 27
0
Changing the BLAS from openblas on a F32 box
..._LOCKING=1. Those other suggestions are really helpful too; I really didn't understand what the difference was (I'm still not clear what the differences are between say openblas-openmp and openblas-openmp64), but I did get R to pass mgcv's thread safe test with both openblas-openmp and blis-openmp, so I have aliased those options for use too. Just using blis ( /lib64/blisblas/libblas.so.3 ) was generating a segfault when running the mgcv test. Really appreciate the help! All the best G On Wed, 27 May 2020 at 14:09, I?aki Ucar <iucar at fedoraproject.org> wrote: > > On...
2020 May 27
1
Changing the BLAS from openblas on a F32 box
...ther suggestions are really helpful too; I really didn't > understand what the difference was (I'm still not clear what the > differences are between say openblas-openmp and openblas-openmp64), > but I did get R to pass mgcv's thread safe test with both > openblas-openmp and blis-openmp, so I have aliased those options for > use too. Basically, openblas has a number of features that can be enabled or disabled: 64-bit integer support, threading and parallelization of certain parts using openmp (as, e.g., data.table does). With the combination of these features, we end up...
2016 Jun 02
4
[GSoC 2016] Parameters of a target architecture
...taken, we can get the size of a cache line and the width of the largest vector register (which probably helps to determine the second parameter) from TargetTransformInfo.h. I would be very grateful for your comments, feedback and ideas. Refs.: [1] - http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf -- Cheers, Roman Gareev.
2016 Jun 27
2
[GSoC 2016] Implementation of the packing transformation
Dear community, the next step of the "Improvement of vectorization process in Polly" project is to implement the packing transformation described in http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf. I had a discussion with Tobias and we decided that a packing transformation is in many ways a data-layout transformation that will require to introduce a new array, copy data to the array and change memory access locations of the compute kernel to reference the array. I think that...
2007 Sep 24
1
Pls sends your CV in www.allindiajobbank.com website. It is fast and gives U positive response. Pls pass this message to Ur friend also because vacancies are waiting.
Pls sends your CV in www.allindiajobbank.com website. It is fast and gives U positive response. Pls pass this message to Ur friend also because vacancies are waiting. IT SALES EXECUTIVE Job Description :- Lead Generation & Generating customers by Tele-calling Calling prospective customers and explaining them about the services. Lead generation and successful conversion of leads. Education
2020 May 27
2
Changing the BLAS from openblas on a F32 box
Of course, even a simpler trick is to launch R as follows: LD_PRELOAD=/lib64/atlas/libsatlas.so.3 R and then the symbols in libsatlas take precedence over libopenblas. Or a mix between both alternatives, i.e., setting LD_PRELOAD=/path/to/some/link R and then change that link to point to openblas, atlas... Whatever suits you best. I?aki On Wed, 27 May 2020 at 11:00, I?aki Ucar <iucar at
2007 Feb 22
7
Serializing non-ascii characters
Hello, I''m new to the Prototype Framework. I''m trying to serialize a simple form with a few checkboxes. It seems like non-ascii characters, like Ö, are not encoded properly in some cases. Please take a look: http://troxy.net/serialize.htm Checking both boxes gives the following string: namn=%25C3%25B6rjan&namn=adam ..while it should look like this:
2016 May 20
2
Determination of statements that contain only matrix multiplication
>>>> Maybe it could be a temporary solution. I think that if the checks are >>>> successfully passed and the basic block of the statement has exactly >>>> 14 instructions, the statement contains matrix multiplication and can >>>> be safely optimized with a generation of specific code, which takes >>>> into account information about usage
2017 Sep 04
2
[RFC] Polly Status and Integration
...olyhedral", we generally work on a tree representation of the execution order, which -- even though not an AST -- can be modified in a similar way, such that loop transformations can be applied on a per-node level. We use the latter to implement certain specific transformations, e.g. the GOTO/BLIS matrix multiplication transformation, which is a very specific sequence of transformations that is known to yield "optimal" performance. Best, Tobias > > > > Thanks, > > > > -- Vikram Adve > > > > // Interim Head and Professor, Department of Comput...
2007 Feb 21
8
Element.addMethods() is not a function after upgrade to 1.7.0
After upgrading to prototype 1.5.0 and scriptaculous 1.7.0 my autocompleter.local died... firebug reports that effects.js is to blame calling Element.addMethods(); I can not find out why cause as far as I can see it does exist in prototype.js. Any pointers anyone :( --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups
2002 Jun 26
6
GUI's for teaching
Dear All, There is no advantage of GUI over CLI, IMO. The real issue is the answer to the questions: "What should I do next?" or "What am I allowed to do here?" A "nice" interface, not necessarily GUI, will offer friendly answers: "I was expecting you to do _this_" or "In this situation you are allowed to do _these things_" You see, it's all
2020 Jan 03
10
Writing loop transformations on the right representation is more productive
...--------------------- Optimization should be applied from very specialized to very general (potentially after some canonicalization). For instance, the first step could be detecting common idioms such as gemm and replace them with either a BLAS function call or apply well-studied optimizations like BLIS to them. After such an idiom has been detected, no other transformation should be applied to them. Mid-level transformations may try to map entire loop nests to cache- and compute hierarchies (SIMT threads, multiprocessors, offloading, etc) by applying transformations such as tiling, loop intercha...
2020 Jan 11
2
Writing loop transformations on the right representation is more productive
...d be applied from very specialized to very general > > (potentially after some canonicalization). For instance, the first > > step could be detecting common idioms such as gemm and replace them > > with either a BLAS function call or apply well-studied optimizations > > like BLIS to them. After such an idiom has been detected, no other > > transformation should be applied to them. > > I'm sceptical to such a machinery. People usually write bad code (me > included) and trying to mach multiple patterns to the same semantics > will be hard, considering ho...
2017 Sep 04
2
llvm-dev Digest, Vol 159, Issue 2
Hal, Tobias, et al. – I am strongly in favor of seeing a broader range of loop transformations, supported by strong dependence analysis, added to LLVM, and the Polly infrastructure seems to be by far our best bet to make that happen. I have a couple of questions: 1) Integer constraint libraries like ISL (and Omega, which I used extensively in a previous project) are fundamentally solving