Displaying 20 results from an estimated 4000 matches similar to: "[LLVMdev] loop vectorizer: Unexpected extract/insertelement"
2013 Nov 06
0
[LLVMdev] loop vectorizer: Unexpected extract/insertelement
The loop vectorizer relies on cleanup passes to be run after it:
from Transforms/IPO/PassManagerBuilder.cpp:
// Add the various vectorization passes and relevant cleanup passes for
// them since we are no longer in the middle of the main scalar pipeline.
MPM.add(createLoopVectorizePass(DisableUnrollLoops));
MPM.add(createInstructionCombiningPass());
2013 Nov 06
2
[LLVMdev] loop vectorizer: Unexpected extract/insertelement
The instcombine pass cleans up a lot.
Any idea why there are still shufflevector, insertelement, *and* bitcast
(!!) etc. instructions left? The original loop is so clean, a textbook
example I'd say. There is no need to shuffle anything.At least I don't
see it.
Frank
vector.ph: ; preds = %L5
%broadcast.splatinsert1 = insertelement <4 x
2013 Nov 06
0
[LLVMdev] loop vectorizer: Unexpected extract/insertelement
Yes, you need the latest ToT version of llvm or you run
-loop-vectorize -earlycse -instcombine -simplifycfg
The bitcast essentially is a noop to satisfy the type system.
This is how your example looks like for me:
vector.body: ; preds = %vector.body, %vector.ph
%index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
%.lhs = shl i64 %6, 2
2013 Nov 01
2
[LLVMdev] loop vectorizer: this loop is not worth vectorizing
I am trying a setup where the one loop is rewritten as two loops. This
avoids the 'rem' and 'div' instructions in the index calculation (which
give the loop vectorizer a hard time).
However, with this setup the loop vectorizer complains about a too small
loop.
LV: Checking a loop in "main"
LV: Found a loop: L3
LV: Found a loop with a very small trip count. This loop
2013 Nov 01
0
[LLVMdev] loop vectorizer: this loop is not worth vectorizing
In the case when coming from C it was probably the loop unroller and SLP
vectorizer which vectorized the code. Potentially I could do the same in
the IR. However, the loop body that is generated in the IR can get very
large. Thus, the loop unroller will refuse to unroll the loop in a large
number of (important) cases.
Isn't there a way to convince the loop vectorizer that it should
2013 Nov 11
2
[LLVMdev] loop vectorizer: JIT + AVX segfaults
For what it's worth, I'm also experiencing this same issue. If there is
interest I can provide some very simple reproducible test cases, but I was
planning on moving to MCJIT this week anyway.
--
View this message in context: http://llvm.1065342.n5.nabble.com/loop-vectorizer-JIT-AVX-segfaults-tp63089p63115.html
Sent from the LLVM - Dev mailing list archive at Nabble.com.
2013 Nov 10
3
[LLVMdev] loop vectorizer erroneously finds 256 bit vectors
The loop vectorizer is doing an amazing job so far. Most of the time.
I just came across one function which led to unexpected behavior:
On this function the loop vectorizer finds a 256 bit vector as the
wides vector type for the x86-64 architecture. (!)
This is strange, as it was always finding the correct size of 128 bit
as the widest type. I isolated the IR of the function to check if this
is
2013 Nov 10
0
[LLVMdev] loop vectorizer erroneously finds 256 bit vectors
I looked more into this. For the previously sent IR the vector width of
256 bit is found mistakenly (and reproducibly) on this hardware:
model name : Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
For the same IR the loop vectorizer finds the correct vector width (128
bit) on:
model name : Intel(R) Xeon(R) CPU E5630 @ 2.53GHz
model name : Intel(R) Core(TM) i7 CPU M 640 @
2013 Nov 10
2
[LLVMdev] loop vectorizer erroneously finds 256 bit vectors
Hi Frank,
I'm not an Intel expert, but it seems that your Xeon E5 supports AVX, which
does have 256-bit vectors. The other two only supports SSE instructions,
which are only 128-bit long.
cheers,
--renato
On 10 November 2013 06:05, Frank Winter <fwinter at jlab.org> wrote:
> I looked more into this. For the previously sent IR the vector width of
> 256 bit is found mistakenly
2013 Dec 06
1
Paging in waves.
I've been working on writing a subroutine to page groups of phones at once
and I'm having some difficulty.
My goal is to have a user call an extension, I record the page they wish to
play, I then page out that recorded file to the phones in groups.
[sub-masspage]
exten => s,1,NoOP
same => n,Answer
same => n,Set(filename=$PAGE)
same => n,Wait(1)
same =>
2013 Oct 31
3
[LLVMdev] loop vectorizer misses opportunity, exploit
----- Original Message -----
>
> Hi Nadav,
>
> that's the whole point of it. I can't in general make the index
> calculation simpler. The example given is the simplest non-trivial
> index function that is needed. It might well be that it's that
> simple that the index calculation in this case can be thrown aways
> altogether and - as you say - be replaced by
2013 Nov 11
0
[LLVMdev] loop vectorizer: JIT + AVX segfaults
I changed the code to use the MCJIT engine. As Josh suspected
it's the same issue: The program runs fine on SSE based machines,
but SEGFAULTs on a CPU with AVX extensions.
I attach the repro case.
Should I file a bug report? P.S. On bugzilla there is the component
'new-bugs'. Should all new bugs be filed there?
Frank
On 11/11/13 08:45, Josh Klontz wrote:
> For what it's
2005 Sep 03
2
Problem with swig?
Take a look at the generated code from ListBox.cpp:
static VALUE
_wrap_new_wxListBox__SWIG_0(int argc, VALUE *argv, VALUE self) {
VALUE arg1 ;
wxWindow *arg2 = (wxWindow *) 0 ;
wxWindowID arg3 ;
wxPoint *arg4 = 0 ;
wxSize *arg5 = 0 ;
int arg6 ;
wxString *arg7 ;
long arg8 ;
wxValidator *arg9 = 0 ;
wxString *arg10 = 0 ;
wxListBox *result;
wxString
2013 Nov 10
2
[LLVMdev] loop vectorizer: JIT + AVX segfaults
Is it possible that the AVX support in the JIT engine or x86-64 backend
is not mature? I am getting segfaults when switching from a vector
length 4 to 8 in my application. I isolated the barfing function and it
still segfaults in the minimal setup:
The IR attached implements the following simple function:
void bar(int start, int end, int ignore , bool add , bool addme , float*
out, float* in)
{
2013 Nov 10
0
[LLVMdev] loop vectorizer erroneously finds 256 bit vectors
Hi Renato,
you are right! There is 'avx' support:
fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36
clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb
rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology
nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est
tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 x2apic popcnt aes xsave
2007 Jan 10
13
[DTrace] how to get socket read size
Hi
i''m trying to write my first dtrace script apparently i bit off a bit
more than i can chew, i want to track io over sockets, i found your
socketsize.d that gave me how to track writes, but i''m at a loss how
to track reads, frankly i don''t see how your write tracker works
because it uses a probe in a function that only takes two arguments
but you grab size of write
2018 Jan 20
2
Can anyone help with a quick app_record.c module improvement and can explain over-riding modules?
On 20 January 2018 at 23:30, Tim S <tim.strommen at gmail.com> wrote:
> I have seen this take over 2 seconds before on a sluggish machine.
Thanks - my host uses SSD and everything seems pretty quick, but I'll
give it a 1 second pause.
> you'd need to pipe that to a Google Speech API tunnel.
> That's probably not something you can hack away at with simple
> Asterisk
2009 Feb 18
4
tracing aio syscalls
Hi all,
Is there some documentation or some example on how to interpret the arg0
.. arg<n> for the aioread, aiowrite, aiowait syscalls? The system call
name for all three seems to be "kaio".
Michael
=== Michael Mueller ==================
Tel. + 49 8171 63600
Fax. + 49 8171 63615
Web: http://www.michael-mueller-it.de
======================================
2012 Oct 24
3
[LLVMdev] RegisterCoalescing Pass seems to ignore part of CFG.
Hi,
I don't know if my llvm ir code is faulty, or if I spot a bug in the RegisterCoalescing Pass, so I'm posting my issue on the ML. Shader and print-before-all dump are given below.
The interessing part is the vreg6/vreg48 reduction : before RegCoalescing, the machine code is :
// BEFORE LOOP
... Some COPYs....
400B%vreg47<def> = COPY %vreg2<kill>; R600_Reg32:%vreg47,%vreg2
2007 Apr 19
3
[RFC, PATCH 1/5] Paravirt_ops full patching.patch
Add 5-argument handling for paravirt ops patching of PAE functions.
Signed-off-by: Zachary Amsden <zach@vmware.com>
diff -r dbe11208916f include/asm-i386/paravirt.h
--- a/include/asm-i386/paravirt.h Thu Apr 19 11:40:55 2007 -0700
+++ b/include/asm-i386/paravirt.h Thu Apr 19 12:04:16 2007 -0700
@@ -308,10 +308,9 @@ unsigned paravirt_patch_insns(void *site
* return value handling from