thr3ads.net - similar to: "[LLVMdev] loop vectorizer: JIT + AVX segfaults"

Displaying 20 results from an estimated 7000 matches similar to: "[LLVMdev] loop vectorizer: JIT + AVX segfaults"

[LLVMdev] loop vectorizer: JIT + AVX segfaults

2013 Nov 11

[LLVMdev] loop vectorizer: JIT + AVX segfaults

Do you have a stack trace of the segfault? We have two different code emitters for X86 in LLVM. The one used by the normal compiler and MCJIT and the other used by the legacy JIT. All of the test cases for AVX support go through the first one so it gets the most attention. We try to keep the legacy JIT in sync with it, but have a history of failing at that. The stack trace of the segfault may

[LLVMdev] loop vectorizer: JIT + AVX segfaults

2013 Nov 11

[LLVMdev] loop vectorizer: JIT + AVX segfaults

It's not much. (gdb) bt #0 0x00007ffff7f6506b in ?? () #1 0x000000000045d01a in main () at main.cc:165 Line 165 is the call to the function that was compiled by the JIT'er. Meaning that JIT'ing the function went well, but the code or the pointer are somehow corrupt. There is no particular reason why I am working with the legacy interface. Would you recommend to use the MCJIT

[LLVMdev] loop vectorizer: JIT + AVX segfaults

2013 Nov 11

[LLVMdev] loop vectorizer: JIT + AVX segfaults

I changed the code to use the MCJIT engine. As Josh suspected it's the same issue: The program runs fine on SSE based machines, but SEGFAULTs on a CPU with AVX extensions. I attach the repro case. Should I file a bug report? P.S. On bugzilla there is the component 'new-bugs'. Should all new bugs be filed there? Frank On 11/11/13 08:45, Josh Klontz wrote: > For what it's

[LLVMdev] loop vectorizer erroneously finds 256 bit vectors

2013 Nov 10

[LLVMdev] loop vectorizer erroneously finds 256 bit vectors

Hi Renato, you are right! There is 'avx' support: fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 x2apic popcnt aes xsave

[LLVMdev] loop vectorizer erroneously finds 256 bit vectors

2013 Nov 10

[LLVMdev] loop vectorizer erroneously finds 256 bit vectors

Hi Frank, I'm not an Intel expert, but it seems that your Xeon E5 supports AVX, which does have 256-bit vectors. The other two only supports SSE instructions, which are only 128-bit long. cheers, --renato On 10 November 2013 06:05, Frank Winter <fwinter at jlab.org> wrote: > I looked more into this. For the previously sent IR the vector width of > 256 bit is found mistakenly

R: one bananna aov() question

2000 Mar 31

R: one bananna aov() question

Hello world, I'm trying to do an anova on data in data.set, dependent variable is a column named "dep.var", grouping variable is in a column called "indep.var", and is.factor(indep.var) is TRUE... why can't I just do aov(dep.var ~ indep.var, data = data.set)? What have I done to deserve this?! What gives? Am I missing something totlly obvious? R-base-1.0.0-1,

Strucchange: Breakpoint slow

2012 Jun 27

Strucchange: Breakpoint slow

Hi to all, I am trying to run breakpoints() on a fairly large sample (>10.000 observations). The process is very slow, any idea on how to speed this up? I have tried the hpc="foreach" parameter, but this didn't work at all when I tried to run it on a smaller sample. breakpoints(x ~ x.l1 + x.l2 + X.l3 + x.l4 + x.l5 + x.l6 + x.l7 + x.l8 + y.l1 + y.l2 + y.l3 + y.l4 + y.l5 + y.l6

[LLVMdev] loop vectorizer erroneously finds 256 bit vectors

2013 Nov 10

[LLVMdev] loop vectorizer erroneously finds 256 bit vectors

I looked more into this. For the previously sent IR the vector width of 256 bit is found mistakenly (and reproducibly) on this hardware: model name : Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz For the same IR the loop vectorizer finds the correct vector width (128 bit) on: model name : Intel(R) Xeon(R) CPU E5630 @ 2.53GHz model name : Intel(R) Core(TM) i7 CPU M 640 @

[LLVMdev] loop vectorizer: JIT + AVX segfaults

2013 Nov 11

[LLVMdev] loop vectorizer: JIT + AVX segfaults

For what it's worth, I'm also experiencing this same issue. If there is interest I can provide some very simple reproducible test cases, but I was planning on moving to MCJIT this week anyway. -- View this message in context: http://llvm.1065342.n5.nabble.com/loop-vectorizer-JIT-AVX-segfaults-tp63089p63115.html Sent from the LLVM - Dev mailing list archive at Nabble.com.

[LLVMdev] loop vectorizer erroneously finds 256 bit vectors

2013 Nov 10

[LLVMdev] loop vectorizer erroneously finds 256 bit vectors

The loop vectorizer is doing an amazing job so far. Most of the time. I just came across one function which led to unexpected behavior: On this function the loop vectorizer finds a 256 bit vector as the wides vector type for the x86-64 architecture. (!) This is strange, as it was always finding the correct size of 128 bit as the widest type. I isolated the IR of the function to check if this is

How consistent is predict() syntax?

2007 Apr 13

How consistent is predict() syntax?

I have a situation where lagged values of a time-series are used to predict future values. I have packed together the time-series and the lagged values into a data frame: > str(D) 'data.frame': 191 obs. of 13 variables: $ y : num -0.21 -2.28 -2.71 2.26 -1.11 1.71 2.63 -0.45 -0.11 4.79 ... $ y.l1 : num NA -0.21 -2.28 -2.71 2.26 -1.11 1.71 2.63 -0.45 -0.11 ... $ y.l2 : num

[Bug 991] New: Exactly after 24h of uptime system hungs

2014 Dec 12

[Bug 991] New: Exactly after 24h of uptime system hungs

https://bugzilla.netfilter.org/show_bug.cgi?id=991 Bug ID: 991 Summary: Exactly after 24h of uptime system hungs Product: netfilter/iptables Version: unspecified Hardware: sparc64 OS: Debian GNU/Linux Status: NEW Severity: blocker Priority: P5 Component: ip_tables (kernel)

[LLVMdev] loop vectorizer: Unexpected extract/insertelement

2013 Nov 06

[LLVMdev] loop vectorizer: Unexpected extract/insertelement

Yes, you need the latest ToT version of llvm or you run -loop-vectorize -earlycse -instcombine -simplifycfg The bitcast essentially is a noop to satisfy the type system. This is how your example looks like for me: vector.body: ; preds = %vector.body, %vector.ph %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ] %.lhs = shl i64 %6, 2

[LLVMdev] loop vectorizer: Unexpected extract/insertelement

2013 Nov 06

[LLVMdev] loop vectorizer: Unexpected extract/insertelement

The loop vectorizer relies on cleanup passes to be run after it: from Transforms/IPO/PassManagerBuilder.cpp: // Add the various vectorization passes and relevant cleanup passes for // them since we are no longer in the middle of the main scalar pipeline. MPM.add(createLoopVectorizePass(DisableUnrollLoops)); MPM.add(createInstructionCombiningPass());

[LLVMdev] loop vectorizer: Unexpected extract/insertelement

2013 Nov 06

[LLVMdev] loop vectorizer: Unexpected extract/insertelement

The instcombine pass cleans up a lot. Any idea why there are still shufflevector, insertelement, *and* bitcast (!!) etc. instructions left? The original loop is so clean, a textbook example I'd say. There is no need to shuffle anything.At least I don't see it. Frank vector.ph: ; preds = %L5 %broadcast.splatinsert1 = insertelement <4 x

[LLVMdev] loop vectorizer: Unexpected extract/insertelement

2013 Nov 06

[LLVMdev] loop vectorizer: Unexpected extract/insertelement

The following IR implements the following nested loop: for (int i = start ; i < end ; ++i ) for (int p = 0 ; p < 4 ; ++p ) a[i*4+p] = b[i*4+p] + c[i*4+p]; define void @main(i64 %arg0, i64 %arg1, i1 %arg2, i64 %arg3, float* noalias %arg4, float* noalias %arg5, float* noalias %arg6) { entrypoint: br i1 %arg2, label %L0, label %L1 L0:

New routine: FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_16

2013 Aug 22

New routine: FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_16

libFLAC have three SSE-accelerated functions FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_N (N = 4, 8, 12). They require lpc_order less than N. The best compression preset (flac -8) uses lpc_order up to 12; it means that during encoding FLAC also uses unaccelerated C function. I'm not very familiar with asm so I took FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_12, changed it and

How to get the case value from Machine Instruction

2018 Apr 10

How to get the case value from Machine Instruction

Thanks for your help. Is there possible I can get the realily case value form the MI? For the case in https://bugs.llvm.org/show_bug.cgi?id=34902. as follows. ############################# * GCC v7.1 generated assembly ############################# ** Options: -Os -marm -march=armv7-a foo: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 sub

Samba 3.0.21a (64 Bit) dumps core when trying to join domain on Solaris 9

2006 Jan 30

Samba 3.0.21a (64 Bit) dumps core when trying to join domain on Solaris 9

Hi, I recently tried to get Samba 3.0.21a running on Solaris 9 several times, using different build environments. The compilers in use where Sun Forte Version 11 and gcc 3.4.2. The binaries where compiled for 64 bit, using CFLAGS="-m64" for gcc for example. I just used configure --prefix=<path> The core file analysis of the latest build shows that strlen() is called: # mdb core

NHW Project - lower quality settings

2018 Mar 09

NHW Project - lower quality settings

Hello, I have re-tested -l4 high compression setting and it's clear that it lacks of precision on degraded, rather blurred images. So I don't know if it is a good idea to base the other lower quality settings (-l5,-l6,...) on -l4 setting.I have tested the NHW codec against x265, x264, Daala, WebP, Rududu, DLI and it's clear that at high compression these very good codecs have more

similar to: [LLVMdev] loop vectorizer: JIT + AVX segfaults