similar to: [LLVMdev] loop vectorizer: Unexpected extract/insertelement

Displaying 20 results from an estimated 4000 matches similar to: "[LLVMdev] loop vectorizer: Unexpected extract/insertelement"

2013 Nov 06
0
[LLVMdev] loop vectorizer: Unexpected extract/insertelement
The loop vectorizer relies on cleanup passes to be run after it: from Transforms/IPO/PassManagerBuilder.cpp: // Add the various vectorization passes and relevant cleanup passes for // them since we are no longer in the middle of the main scalar pipeline. MPM.add(createLoopVectorizePass(DisableUnrollLoops)); MPM.add(createInstructionCombiningPass());
2013 Nov 06
2
[LLVMdev] loop vectorizer: Unexpected extract/insertelement
The instcombine pass cleans up a lot. Any idea why there are still shufflevector, insertelement, *and* bitcast (!!) etc. instructions left? The original loop is so clean, a textbook example I'd say. There is no need to shuffle anything.At least I don't see it. Frank vector.ph: ; preds = %L5 %broadcast.splatinsert1 = insertelement <4 x
2013 Nov 06
0
[LLVMdev] loop vectorizer: Unexpected extract/insertelement
Yes, you need the latest ToT version of llvm or you run -loop-vectorize -earlycse -instcombine -simplifycfg The bitcast essentially is a noop to satisfy the type system. This is how your example looks like for me: vector.body: ; preds = %vector.body, %vector.ph %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ] %.lhs = shl i64 %6, 2
2013 Nov 01
2
[LLVMdev] loop vectorizer: this loop is not worth vectorizing
I am trying a setup where the one loop is rewritten as two loops. This avoids the 'rem' and 'div' instructions in the index calculation (which give the loop vectorizer a hard time). However, with this setup the loop vectorizer complains about a too small loop. LV: Checking a loop in "main" LV: Found a loop: L3 LV: Found a loop with a very small trip count. This loop
2013 Nov 01
0
[LLVMdev] loop vectorizer: this loop is not worth vectorizing
In the case when coming from C it was probably the loop unroller and SLP vectorizer which vectorized the code. Potentially I could do the same in the IR. However, the loop body that is generated in the IR can get very large. Thus, the loop unroller will refuse to unroll the loop in a large number of (important) cases. Isn't there a way to convince the loop vectorizer that it should
2013 Nov 11
2
[LLVMdev] loop vectorizer: JIT + AVX segfaults
For what it's worth, I'm also experiencing this same issue. If there is interest I can provide some very simple reproducible test cases, but I was planning on moving to MCJIT this week anyway. -- View this message in context: http://llvm.1065342.n5.nabble.com/loop-vectorizer-JIT-AVX-segfaults-tp63089p63115.html Sent from the LLVM - Dev mailing list archive at Nabble.com.
2013 Nov 10
3
[LLVMdev] loop vectorizer erroneously finds 256 bit vectors
The loop vectorizer is doing an amazing job so far. Most of the time. I just came across one function which led to unexpected behavior: On this function the loop vectorizer finds a 256 bit vector as the wides vector type for the x86-64 architecture. (!) This is strange, as it was always finding the correct size of 128 bit as the widest type. I isolated the IR of the function to check if this is
2013 Nov 10
0
[LLVMdev] loop vectorizer erroneously finds 256 bit vectors
I looked more into this. For the previously sent IR the vector width of 256 bit is found mistakenly (and reproducibly) on this hardware: model name : Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz For the same IR the loop vectorizer finds the correct vector width (128 bit) on: model name : Intel(R) Xeon(R) CPU E5630 @ 2.53GHz model name : Intel(R) Core(TM) i7 CPU M 640 @
2013 Nov 10
2
[LLVMdev] loop vectorizer erroneously finds 256 bit vectors
Hi Frank, I'm not an Intel expert, but it seems that your Xeon E5 supports AVX, which does have 256-bit vectors. The other two only supports SSE instructions, which are only 128-bit long. cheers, --renato On 10 November 2013 06:05, Frank Winter <fwinter at jlab.org> wrote: > I looked more into this. For the previously sent IR the vector width of > 256 bit is found mistakenly
2013 Dec 06
1
Paging in waves.
I've been working on writing a subroutine to page groups of phones at once and I'm having some difficulty. My goal is to have a user call an extension, I record the page they wish to play, I then page out that recorded file to the phones in groups. [sub-masspage] exten => s,1,NoOP same => n,Answer same => n,Set(filename=$PAGE) same => n,Wait(1) same =>
2013 Oct 31
3
[LLVMdev] loop vectorizer misses opportunity, exploit
----- Original Message ----- > > Hi Nadav, > > that's the whole point of it. I can't in general make the index > calculation simpler. The example given is the simplest non-trivial > index function that is needed. It might well be that it's that > simple that the index calculation in this case can be thrown aways > altogether and - as you say - be replaced by
2013 Nov 11
0
[LLVMdev] loop vectorizer: JIT + AVX segfaults
I changed the code to use the MCJIT engine. As Josh suspected it's the same issue: The program runs fine on SSE based machines, but SEGFAULTs on a CPU with AVX extensions. I attach the repro case. Should I file a bug report? P.S. On bugzilla there is the component 'new-bugs'. Should all new bugs be filed there? Frank On 11/11/13 08:45, Josh Klontz wrote: > For what it's
2005 Sep 03
2
Problem with swig?
Take a look at the generated code from ListBox.cpp: static VALUE _wrap_new_wxListBox__SWIG_0(int argc, VALUE *argv, VALUE self) { VALUE arg1 ; wxWindow *arg2 = (wxWindow *) 0 ; wxWindowID arg3 ; wxPoint *arg4 = 0 ; wxSize *arg5 = 0 ; int arg6 ; wxString *arg7 ; long arg8 ; wxValidator *arg9 = 0 ; wxString *arg10 = 0 ; wxListBox *result; wxString
2013 Nov 10
2
[LLVMdev] loop vectorizer: JIT + AVX segfaults
Is it possible that the AVX support in the JIT engine or x86-64 backend is not mature? I am getting segfaults when switching from a vector length 4 to 8 in my application. I isolated the barfing function and it still segfaults in the minimal setup: The IR attached implements the following simple function: void bar(int start, int end, int ignore , bool add , bool addme , float* out, float* in) {
2013 Nov 10
0
[LLVMdev] loop vectorizer erroneously finds 256 bit vectors
Hi Renato, you are right! There is 'avx' support: fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 x2apic popcnt aes xsave
2007 Jan 10
13
[DTrace] how to get socket read size
Hi i''m trying to write my first dtrace script apparently i bit off a bit more than i can chew, i want to track io over sockets, i found your socketsize.d that gave me how to track writes, but i''m at a loss how to track reads, frankly i don''t see how your write tracker works because it uses a probe in a function that only takes two arguments but you grab size of write
2018 Jan 20
2
Can anyone help with a quick app_record.c module improvement and can explain over-riding modules?
On 20 January 2018 at 23:30, Tim S <tim.strommen at gmail.com> wrote: > I have seen this take over 2 seconds before on a sluggish machine. Thanks - my host uses SSD and everything seems pretty quick, but I'll give it a 1 second pause. > you'd need to pipe that to a Google Speech API tunnel. > That's probably not something you can hack away at with simple > Asterisk
2009 Feb 18
4
tracing aio syscalls
Hi all, Is there some documentation or some example on how to interpret the arg0 .. arg<n> for the aioread, aiowrite, aiowait syscalls? The system call name for all three seems to be "kaio". Michael === Michael Mueller ================== Tel. + 49 8171 63600 Fax. + 49 8171 63615 Web: http://www.michael-mueller-it.de ======================================
2012 Oct 24
3
[LLVMdev] RegisterCoalescing Pass seems to ignore part of CFG.
Hi, I don't know if my llvm ir code is faulty, or if I spot a bug in the RegisterCoalescing Pass, so I'm posting my issue on the ML. Shader and print-before-all dump are given below. The interessing part is the vreg6/vreg48 reduction : before RegCoalescing, the machine code is : // BEFORE LOOP ... Some COPYs.... 400B%vreg47<def> = COPY %vreg2<kill>; R600_Reg32:%vreg47,%vreg2
2007 Apr 19
3
[RFC, PATCH 1/5] Paravirt_ops full patching.patch
Add 5-argument handling for paravirt ops patching of PAE functions. Signed-off-by: Zachary Amsden <zach@vmware.com> diff -r dbe11208916f include/asm-i386/paravirt.h --- a/include/asm-i386/paravirt.h Thu Apr 19 11:40:55 2007 -0700 +++ b/include/asm-i386/paravirt.h Thu Apr 19 12:04:16 2007 -0700 @@ -308,10 +308,9 @@ unsigned paravirt_patch_insns(void *site * return value handling from