thr3ads.net - search: "interleaving"

Displaying 20 results from an estimated 974 matches for "interleaving".

2005 Nov 15

OggPCM2 : chunked vs interleaved data

Michael Smith wrote: >Whilst I accept that there are many good uses for chunked data, I >think the transformation is trivial, particularly given certain >characteristics of the Ogg container. Remember, the data, if you read >an ogg stream into memory, is _already_ likely to be non-contiguous, >due to ogg's structure. It's trivial, and has insignificant additional

enabling interleaved access loop vectorization

2016 Aug 05

enabling interleaved access loop vectorization

...cx # imm = 0x3E0 jne .LBB0_7 The performance I see out of the 3 versions (with a 500K-iteration outer loop): Scalar: 0m10.320s Vector (Non-interleaved): 0m8.054s Vector (Interleaved): 0m3.541s This is far from being the perfect use case for interleaved access: 1) There's no real interleaving, just one strided gather, so this would be better served by Ashutosh's full "strided access" proposal. 2) It looks like the actual move + shuffle sequence is not better, and even probably worse, than just inserting directly from memory - but it's still worthwhile because of how mu...

OggPCM2 : chunked vs interleaved data

2005 Nov 15

OggPCM2 : chunked vs interleaved data

Hi all, The remaining issue to be decided for the OggPCM2 spec is the support of chunked vs interleaved data. Just so that everyone understands what we are talking about, consider a stereo file that gets stored as an OggPCM file. Within an OggPCM packet, the audio samples for the left and right channels can be stored as interleaved where the samples would be: l0, r0, l1, r1, ..... lN, rN

enabling interleaved access loop vectorization

2016 Aug 05

enabling interleaved access loop vectorization

...he 3 versions (with a 500K-iteration outer > loop): > > > > Scalar: 0m10.320s > > Vector (Non-interleaved): 0m8.054s > > Vector (Interleaved): 0m3.541s > > > > This is far from being the perfect use case for interleaved access: > > 1) There's no real interleaving, just one strided gather, so this would be > better served by Ashutosh's full "strided access" proposal. > > 2) It looks like the actual move + shuffle sequence is not better, and > even probably worse, than just inserting directly from memory - but it's > still wor...

enabling interleaved access loop vectorization

2016 May 26

enabling interleaved access loop vectorization

Interleaved access is not enabled on X86 yet. We looked at this feature and got into conclusion that interleaving (as loads + shuffles) is not always profitable on X86. We should provide the right cost which depends on number of shuffles. Number of shuffles depends on permutations (shuffle mask). And even if we estimate the number of shuffles, the shuffles are not generated in-place. Vectorizer produces a long...

Interleaving elements of two vectors?

2006 Mar 06

Interleaving elements of two vectors?

...e an R function which alternates between elements of one vector and the next? In other words, one wants z <- c(x[1], y[1], x[2], y[2], x[3], y[3], x[4], y[4], x[5], y[5]) I couldn't think of a clever and general way to write this. I am aware of gdata::interleave() but it deals with interleaving rows of a data frame, not elems of vectors. -- Ajay Shah http://www.mayin.org/ajayshah ajayshah at mayin.org http://ajayshahblog.blogspot.com <*(:-? - wizard who doesn't know the answer.

[RFC] Make LoopVectorize Aware of SLP Operations

2018 Feb 06

[RFC] Make LoopVectorize Aware of SLP Operations

Hello, We would like to propose making LoopVectorize aware of SLP operations, to improve the generated code for loops operating on struct fields or doing complex math. At the moment, LoopVectorize uses interleaving to vectorize loops that operate on values loaded/stored from consecutive addresses: vector loads/stores are generated to combine consecutive loads/stores and then shufflevector is used to de-interleave/interleave the loaded/stored values. At the moment however, we fail to detect cases where th...

enabling interleaved access loop vectorization

2016 May 26

enabling interleaved access loop vectorization

On 26 May 2016 at 19:12, Sanjay Patel via llvm-dev <llvm-dev at lists.llvm.org> wrote: > Is there a compile-time and/or potential runtime cost that makes > enableInterleavedAccessVectorization() default to 'false'? > > I notice that this is set to true for ARM, AArch64, and PPC. > > In particular, I'm wondering if there's a reason it's not enabled for

[RFC] Make LoopVectorize Aware of SLP Operations

2018 Feb 08

[RFC] Make LoopVectorize Aware of SLP Operations

...can handle complicated cases to something ad-hoc for some simple ones. Because of this, we would have some comments regarding the design that you propose: > 1. Detect loops containing SLP opportunities (operations on compound > values) > 2. Extend the cost model to choose between interleaving or using > compound values > 3. Add support for vectorizing compound operations to VPlan Currently, VPlan is not fully integrated in all the stages of the inner loop vectorizer pipeline. For that reason, part of your implementation (#1 and #2) would happen outside of VPlan and another...

loop vectorizer disabling

2019 Sep 10

loop vectorizer disabling

...opose that loop pragma `vectorize(disable)` actually means disabling the vectorizer for that loop. This perhaps sounds really obvious (I hope it does), but currently `vectorize(disable)` sets the vectorization width to 1, and that means the vectorizer will run and could perform other tricks such as interleaving. The main reason to change the behaviour is that it will be more what (most) users would expect. I think we reached consensus on changing the behaviour in [4], but since this is changing the behaviour of a user-facing pragma, we would like to know if there are any objections. If people rely on the...

Compress interleaved multi-channels pcm/wav with opus

2019 Apr 22

Compress interleaved multi-channels pcm/wav with opus

Hello everyone, I tried to compress audio with opus-1.3.1/src/opus_demo.c recently, which works fine on mono and stereo data . Now I want to compress interleaved 7 channels pcm/wav ( recorded by Microphone array :6mic+ 1reference signal ) with opus, But I have not found an interface that compress multi-channels pcm/wav. 1、Is there a multi-channel compression interface can be used in my case? If

[LLVMdev] LLVM x86 Code Generator discards Instruction-level Parallelism

2010 Nov 03

[LLVMdev] LLVM x86 Code Generator discards Instruction-level Parallelism

...fmul float %r1499, %r21 %r1504 = fmul float %r1500, %r24 ; first use of %1500 %r1505 = fmul float %r1501, %r23 %r1506 = fmul float %r1502, %r22 %r1507 = fmul float %r1503, %r21 %r1508 = fmul float %r1504, %r24 ; first use of %1504 . . The JIT compiler, however, seems to break the interleaving of independent instructions and issues a long sequence of instructions with back-to-back dependencies. It is as if all p1 = .. expressions are collected at once followed by all p2 = .. expressions and so forth. p1 = p1 * a p1 = p1 * a . . p2 = p2 * b p2 = p2 * b . . p3 = p3 * c p3 = p3...

enabling interleaved access loop vectorization

2016 Sep 01

enabling interleaved access loop vectorization

...at google.com>> wrote: Thanks Ayal! On Wed, Aug 17, 2016 at 2:14 PM, Zaks, Ayal <ayal.zaks at intel.com<mailto:ayal.zaks at intel.com>> wrote: Hi Michael, Don’t quite have a full reproducer for you yet. You’re welcome to try and see what’s happening in 32 bit mode when enabling interleaving for the following, based on “https://en.wikipedia.org/wiki/YIQ#From_RGB_to_YIQ”: void rgb2yik (char * in, char * out, int N) { int j; for (j = 0; j < N; ++j) { unsigned char r = *in++; unsigned char g = *in++; unsigned char b = *in++; unsigned char y = 0.299*r + 0.587*g + 0....

Loop Vectorize: Testing cost model driven transformations

2016 Nov 28

Loop Vectorize: Testing cost model driven transformations

...o ensure consistency across targets for transformations driven by the cost model. The existing and future target-independent tests would have to be updated to use the new flag. They also may still choose to manually specify vectorization and interleave factors to force vectorization and interleaving regardless of what the cost model would compute using the default TTI. - Adding a command line option to enable/disable each cost model driven transformation we add. The existing and future target-independent tests would have to be updated to explicitly disable each optimization (or in...

enabling interleaved access loop vectorization

2016 Aug 17

enabling interleaved access loop vectorization

Thanks Ayal! On Wed, Aug 17, 2016 at 2:14 PM, Zaks, Ayal <ayal.zaks at intel.com> wrote: > Hi Michael, > > > > Don’t quite have a full reproducer for you yet. You’re welcome to try and > see what’s happening in 32 bit mode when enabling interleaving for the > following, based on “https://en.wikipedia.org/wiki/YIQ#From_RGB_to_YIQ”: > > > > void rgb2yik (char * in, char * out, int N) > > { > > int j; > > for (j = 0; j < N; ++j) { > > unsigned char r = *in++; > > unsigned char g = *in++;...

enabling interleaved access loop vectorization

2016 Aug 09

enabling interleaved access loop vectorization

Thanks Ayal! I'll take a look at DENBench. As another data point - I tried enabling this on our internal benchmarks. I'm seeing one regression, and it seems to be a regression of the "good" kind - without interleaving we don't vectorize the innermost loop, and with interleaving we do. The vectorized loop is actually significantly faster when benchmarked in isolation, but in this specific instance, the static loop count is unknown, and the dynamic loop count happens to almost always be 1 - and this lives insi...

OggPCM format description, rev 3

2005 Nov 13

OggPCM format description, rev 3

> Unfortunately the ALSA API defines a number of formats which are > in practice extremely rare. In particular, any unsigned int format > larger than 8 bits. For instance, the only unsigned int type that > libsndfile supports is unsigned 8 bit. I expected this, it just seemed like a good starting point to get more than 7 formats on the table. Specifically I wanted to the logarithmic

OggPCM2 : chunked vs interleaved data

2005 Nov 15

OggPCM2 : chunked vs interleaved data

On 11/15/05, Erik de Castro Lopo <mle+xiph@mega-nerd.com> wrote: > Hi all, > > The remaining issue to be decided for the OggPCM2 spec is the support > of chunked vs interleaved data. I think interleaved is the obvious choice - that's what most audio applications are used to dealing with, it's what we need to feed to audio hardware in the end usually, etc. Whilst I

Interleaved writes fwom W2K and NT4

2002 May 27

Interleaved writes fwom W2K and NT4

Hi Jerry I'm still able to recreate failures in with 2.2.4 when interleaving file creation/writing from W2k and NT4 machines to a Samba server. I orginally reported this in 2.2.2a, also 2.2.3: http://lists.samba.org/pipermail/samba/2001-December/063396.html http://lists.samba.org/pipermail/samba/2002-January/063483.html http://lists.samba.org/pipermail/samba/2002-February/...

Interleave cells with IP over ATM?

2005 Jun 06

Interleave cells with IP over ATM?

Anyone know if it''s possible to interleave two IP packets when using PPPoA and VC based lines? Can it be done with any PPPoE implementations? The goal is to reduce the delay when you have a high priority packet waiting, but a lower priority (large) packet already started going out ahead of this packet. I don''t want the overhead of much smaller MTU, which is the other way

search for: interleaving