thr3ads.net - search: "unvectorized"

2006 Sep 11

3

"unvector" ?

Hi ev'rybody, is there a way to pass a vector to a function expecting separate arguments? more specifically, I have a character vector, say u and I want a single string, but >paste(u) doesn't work, so I would like something like >paste(unvector(u)). I am interested in a solution to the general problem too, as the only one I found is maintaining two versions of the functions I

[R] unvectorized option for outer()

2005 Oct 31

1

[R] unvectorized option for outer()

> From: Thomas Lumley > > On Sun, 30 Oct 2005, Jonathan Rougier wrote: > > > I'm not sure about this. Perhaps I am a dinosaur, but my feeling is > > that if people are writing functions in R that might be subject to > > simple operations like outer products, then they ought to be writing > > vectorised functions! > > I would agree. How about an

[LLVMdev] Vectorization: Next Steps

2012 Feb 07

4

[LLVMdev] Vectorization: Next Steps

...l heuristic to handle those cases. That having been said, unroll+(failed vectorize)+rollback is not really any more expensive at compile time than unroll+(failed vectorize) except that the resulting code would run faster (actually it is cheaper to compile because the optimization/compilation of the unvectorized unrolled loop code takes longer than the non-unrolled loop). There might be a clean way of doing this; I'll think about it. Thanks again, Hal > > -Chris -- Hal Finkel Postdoctoral Appointee Leadership Computing Facility Argonne National Laboratory

[LLVMdev] Vectorization: Next Steps

2012 Feb 09

0

[LLVMdev] Vectorization: Next Steps

...e those > cases. That having been said, unroll+(failed vectorize)+rollback is not > really any more expensive at compile time than unroll+(failed vectorize) > except that the resulting code would run faster (actually it is cheaper > to compile because the optimization/compilation of the unvectorized > unrolled loop code takes longer than the non-unrolled loop). There might > be a clean way of doing this; I'll think about it. I don't really understand the issue here, can you elaborate on when this might be a win? I really don't like "speculatively unroll, try to do some...

[LLVMdev] Vectorization: Next Steps

2012 Feb 09

1

[LLVMdev] Vectorization: Next Steps

...ses. That having been said, unroll+(failed vectorize)+rollback is not > > really any more expensive at compile time than unroll+(failed vectorize) > > except that the resulting code would run faster (actually it is cheaper > > to compile because the optimization/compilation of the unvectorized > > unrolled loop code takes longer than the non-unrolled loop). There might > > be a clean way of doing this; I'll think about it. > > I don't really understand the issue here, can you elaborate on when this might be a win? I really don't like "speculatively un...

[LLVMdev] Vectorization: Next Steps

2012 Feb 10

2

[LLVMdev] Vectorization: Next Steps

...will vectorize. The reason the heuristic has such a large default value is to prevent cases where it costs more to permute all of the necessary values into and out of the vector registers than is saved by vectorizing. Does the code generated with -bb-vectorize-req-chain-depth=2 run faster than the unvectorized code? The heuristic can certainly be improved, and these kinds of test cases are very important to that improvement process. -Hal On Thu, 2012-02-09 at 13:27 +0100, Carl-Philip Hänsch wrote: > I have a super-simple test case 4x4 matrix * 4-vector which gets > correctly unrolled, but is no...

doSMP package works better than perfect, at least sometimes.

2011 Apr 19

0

doSMP package works better than perfect, at least sometimes.

...sqrt() calculations in a way one might find in Burns' third circle of Hell. Short loops were used to cause thrashing during processor assigning, and longer ones used to simulate 'harder' or more time-consuming tasks. The processing time of each set of tasks was measured for basic unvectorized for() looping, foreach() %do% looping, and foreach() %dopar% looping, using a 4 core Xenon PC running XP with 3.2 GB RAM. Using 3 'workers', the increase in speed due to iteration with the foreach() %do% construct showed the expected amount of thrashing for small/easy calculations, wit...

[LLVMdev] Vectorization: Next Steps

2012 Feb 13

0

[LLVMdev] Vectorization: Next Steps

...reason the heuristic has such a large default value is to > prevent cases where it costs more to permute all of the necessary values > into and out of the vector registers than is saved by vectorizing. Does > the code generated with -bb-vectorize-req-chain-depth=2 run faster than > the unvectorized code? > > The heuristic can certainly be improved, and these kinds of test cases > are very important to that improvement process. > > -Hal > > On Thu, 2012-02-09 at 13:27 +0100, Carl-Philip Hänsch wrote: > > I have a super-simple test case 4x4 matrix * 4-vector which ge...

[LLVMdev] Vectorization: Next Steps

2012 Feb 14

0

[LLVMdev] Vectorization: Next Steps

...he >> > necessary values >> > into and out of the vector registers than is saved by >> > vectorizing. Does >> > the code generated with -bb-vectorize-req-chain-depth=2 run >> > faster than >> > the unvectorized code? >> > >> > The heuristic can certainly be improved, and these kinds of >> > test cases >> > are very important to that improvement process. >> > >> > -Hal >> > >> > On Thu, 2012-02-...

[LLVMdev] Vectorization: Next Steps

2012 Feb 13

2

[LLVMdev] Vectorization: Next Steps

...prevent cases where it costs more to permute all of the > necessary values > into and out of the vector registers than is saved by > vectorizing. Does > the code generated with -bb-vectorize-req-chain-depth=2 run > faster than > the unvectorized code? > > The heuristic can certainly be improved, and these kinds of > test cases > are very important to that improvement process. > > -Hal > > On Thu, 2012-02-09 at 13:27 +0100, Carl-Philip Hänsch wrote:...

[LLVMdev] Vectorization: Next Steps

2012 Feb 14

2

[LLVMdev] Vectorization: Next Steps

If you run with -vectorize instead of -bb-vectorize it will schedule the cleanup passes for you. -Hal Sent from my Verizon Wireless Droid -----Original message----- From: "Carl-Philip Hänsch" <cphaensch at googlemail.com> To: Hal Finkel <hfinkel at anl.gov> Cc: llvmdev at cs.uiuc.edu Sent: Tue, Feb 14, 2012 16:10:28 GMT+00:00 Subject: Re: [LLVMdev] Vectorization: Next

[LLVMdev] Vectorization: Next Steps

2012 Feb 13

1

[LLVMdev] Vectorization: Next Steps

...ses. That having been said, unroll+(failed vectorize)+rollback is not > > really any more expensive at compile time than unroll+(failed vectorize) > > except that the resulting code would run faster (actually it is cheaper > > to compile because the optimization/compilation of the unvectorized > > unrolled loop code takes longer than the non-unrolled loop). There might > > be a clean way of doing this; I'll think about it. > > I don't really understand the issue here, can you elaborate on when this might be a win? I really don't like "speculatively un...

[LLVMdev] Vectorization: Next Steps

2012 Feb 09

0

[LLVMdev] Vectorization: Next Steps

I have a super-simple test case 4x4 matrix * 4-vector which gets correctly unrolled, but is not vectorized by -bb-vectorize. (I used llvm 3.1svn) I attached the test case so you can see what is going wrong there. 2012/2/3 Hal Finkel <hfinkel at anl.gov> > As some of you may know, I committed my basic-block autovectorization > pass a few days ago. I encourage anyone interested to try

[LLVMdev] Vectorization: Next Steps

2012 Feb 14

0

[LLVMdev] Vectorization: Next Steps

...more to permute all of the > > necessary values > > into and out of the vector registers than is saved by > > vectorizing. Does > > the code generated with -bb-vectorize-req-chain-depth=2 run > > faster than > > the unvectorized code? > > > > The heuristic can certainly be improved, and these kinds of > > test cases > > are very important to that improvement process. > > > > -Hal > > > > On Thu, 2012-02-09 at 13:27 +0100, Carl-Philip H...

[LLVMdev] Vectorization: Next Steps

2012 Feb 06

0

[LLVMdev] Vectorization: Next Steps

On Feb 2, 2012, at 7:56 PM, Hal Finkel wrote: > As some of you may know, I committed my basic-block autovectorization > pass a few days ago. I encourage anyone interested to try it out (pass > -vectorize to opt or -mllvm -vectorize to clang) and provide feedback. > Especially in combination with -unroll-allow-partial, I have observed > some significant benchmark speedups, but, I

[LLVMdev] Strange behaviour with x86-64 windows, bad call instruction address

2012 Feb 21

0

[LLVMdev] Strange behaviour with x86-64 windows, bad call instruction address

...he >> > necessary values >> > into and out of the vector registers than is saved by >> > vectorizing. Does >> > the code generated with -bb-vectorize-req-chain-depth=2 run >> > faster than >> > the unvectorized code? >> > >> > The heuristic can certainly be improved, and these kinds of >> > test cases >> > are very important to that improvement process. >> > >> > -Hal >> > >> > On Thu, 2012-02-...

[LLVMdev] Vectorization: Next Steps

2012 Feb 03

8

[LLVMdev] Vectorization: Next Steps

As some of you may know, I committed my basic-block autovectorization pass a few days ago. I encourage anyone interested to try it out (pass -vectorize to opt or -mllvm -vectorize to clang) and provide feedback. Especially in combination with -unroll-allow-partial, I have observed some significant benchmark speedups, but, I have also observed some significant slowdowns. I would like to share my

[nbdkit PATCH 0/2] Reduce network overhead with corking

2019 Jun 06

4

[nbdkit PATCH 0/2] Reduce network overhead with corking

Slightly RFC, as I need more time to investigate why Unix sockets appeared to degrade with this patch. But as TCP sockets (over loopback to localhost) and TLS sessions (regardless of underlying Unix or TCP) both showed improvements, this looks like a worthwhile series. Eric Blake (2): server: Add support for corking server: Cork around grouped transmission send()s server/internal.h | 3

enabling interleaved access loop vectorization

2016 Sep 01

2

enabling interleaved access loop vectorization

So turns out it is a full reproducer after all (choosing to vectorize on AVX), good. > The details are in PR29025. Interesting. (So we should carefully insert unconditional branches inside shuffle sequences, eh? ;-) > But if we modify the program by adding "*out++ = 0" right after "*out++ = q;" (thus eliminating the pesky <12 x i8>), we get: Indeed such

search for: unvectorized