thr3ads.net - similar to: "[LLVMdev] Codegen for vector float->double cast fails on x86 above SSE3"

Displaying 20 results from an estimated 2000 matches similar to: "[LLVMdev] Codegen for vector float->double cast fails on x86 above SSE3"

[LLVMdev] llvm.x86.sse2.sqrt.pd not using sqrtpd, calling a function that modifies ECX

2013 Jul 19

[LLVMdev] llvm.x86.sse2.sqrt.pd not using sqrtpd, calling a function that modifies ECX

(Changing subject line as diagnosis has changed) I'm attaching the compiled code that I've been getting, both with CodeGenOpt::Default and CodeGenOpt::None . The crash isn't occurring with CodeGenOpt::None, but that seems to be because ECX isn't being used - it still gets set to 0x7fffffff by one of the calls to 76719BA1 I notice that X86::SQRTPD[m|r] appear in

[LLVMdev] Enabling the SLP vectorizer by default for -O3

2013 Jul 23

[LLVMdev] Enabling the SLP vectorizer by default for -O3

Hi, Sorry for the delay in response. I measured the code size change and noticed small changes in both directions for individual programs. I found a 30k binary size growth for the entire testsuite + SPEC. I attached an updated performance report that includes both compile time and performance measurements. Thanks, Nadav On Jul 14, 2013, at 10:55 PM, Nadav Rotem <nrotem at apple.com>

[LLVMdev] Enabling the SLP vectorizer by default for -O3

2013 Jul 15

[LLVMdev] Enabling the SLP vectorizer by default for -O3

On Jul 14, 2013, at 9:52 PM, Chris Lattner <clattner at apple.com> wrote: > > On Jul 13, 2013, at 11:30 PM, Nadav Rotem <nrotem at apple.com> wrote: > >> Hi, >> >> LLVM’s SLP-vectorizer is a new pass that combines similar independent instructions in a straight-line code. It is currently not enabled by default, and people who want to experiment with it

avx512 JIT backend generates wrong code on <4 x float>

2016 Jun 29

avx512 JIT backend generates wrong code on <4 x float>

Hi Frank, I recommend trying trunk LLVM. AVX-512 development has been very active recently. -Hal ----- Original Message ----- > From: "Frank Winter via llvm-dev" <llvm-dev at lists.llvm.org> > To: "LLVM Dev" <llvm-dev at lists.llvm.org> > Sent: Wednesday, June 29, 2016 2:41:39 PM > Subject: [llvm-dev] avx512 JIT backend generates wrong code on <4

avx512 JIT backend generates wrong code on <4 x float>

2016 Jun 30

avx512 JIT backend generates wrong code on <4 x float>

Hi Hal! Thanks, but unfortunately it didn't help. The exact same assembler instructions are generated for both 3.8 (yesterday) and trunk (from today). So, this really looks like a bug. Best, Frank On 06/29/2016 03:48 PM, Hal Finkel wrote: > Hi Frank, > > I recommend trying trunk LLVM. AVX-512 development has been very active recently. > > -Hal > > ----- Original

avx512 JIT backend generates wrong code on <4 x float>

2016 Jun 29

avx512 JIT backend generates wrong code on <4 x float>

Hi! When compiling the attached module with the JIT engine on an Intel KNL I see wrong code getting emitted. I attach a complete exploit program which shows the bug in LLVM 3.8. It loads and JIT compiles the module and prints the assembler. I stumbled on this since the result of an actual calculation was wrong. So, it's not only the text version of the assembler also the machine

[LLVMdev] Use of movupd instead of movapd for x86

2011 Mar 01

[LLVMdev] Use of movupd instead of movapd for x86

On Feb 28, 2011, at 2:58 AM, Sebastien DELDON-GNB wrote: > Understood for the aligned case, I want to measure performance degradation for unaligned case. > I mean unaligned case versus aligned. I know this is stupid, but I want to try to pass a <4 x float>* as parameter of a routine and at the call site I want to pass a misaligned pointer. Since LLVM is generating movapd instruction

[LLVMdev] Use of movupd instead of movapd for x86

2011 Feb 28

[LLVMdev] Use of movupd instead of movapd for x86

Understood for the aligned case, I want to measure performance degradation for unaligned case. I mean unaligned case versus aligned. I know this is stupid, but I want to try to pass a <4 x float>* as parameter of a routine and at the call site I want to pass a misaligned pointer. Since LLVM is generating movapd instruction it will raise an exception (SEGFAULT), I just want to know if there

[LLVMdev] Use of movupd instead of movapd for x86

2011 Feb 25

[LLVMdev] Use of movupd instead of movapd for x86

Sebastien DELDON-GNB <sebastien.deldon at st.com> writes: > Hi all, > > Is there a way to force llc to generate movupd instruction instead of movapd for x86 target ? > > I know that movapd is more performant, but I would like to measure degradation when alignment constraints are not met. On modern processors a movupd on aligned data is going to be indistinguishable in

6327969 cpuid sse3 feature bit not noted on any AMD processor

2006 Oct 31

6327969 cpuid sse3 feature bit not noted on any AMD processor

Author: dmick Repository: /hg/zfs-crypto/gate Revision: 4559d499327b38dfc599c113d49e339f5c0308c3 Log message: 6327969 cpuid sse3 feature bit not noted on any AMD processor Files: update: usr/src/uts/i86pc/os/cpuid.c update: usr/src/uts/intel/sys/x86_archext.h

vctrs: a type system for the tidyverse

2018 Aug 06

vctrs: a type system for the tidyverse

Hadley, Looks interesting and like a fun project from what you said in the email (I don't have time right now to dig deep into the readme) A few thoughts. First off, you are using the word "type" throughout this email; You seem to mean class (judging by your Date and factor examples, and the fact you mention S3 dispatch) as opposed to type in the sense of what is returned by

[LLVMdev] Help with setting up a software-float supported architecture target

2013 May 06

[LLVMdev] Help with setting up a software-float supported architecture target

For a LLVM code target, I am developing a software-supported floating-point engine. I've come into a few issues in which the DAG conversion is Can anyone help with: I have some C code test like: float x; printf("%f", x); which clang coverts to: %0 = load float* @x, align 4 %conv = fpext float %0 to double The issue is for simple Float to Double extension, the DAGCombiner.cpp

[LLVMdev] Use of movupd instead of movapd for x86

2011 Feb 25

[LLVMdev] Use of movupd instead of movapd for x86

Hi all, Is there a way to force llc to generate movupd instruction instead of movapd for x86 target ? I know that movapd is more performant, but I would like to measure degradation when alignment constraints are not met. Best Regards Seb -------------- next part -------------- An HTML attachment was scrubbed... URL:

[cfe-dev] FE_INEXACT being set for an exact conversion from float to unsigned long long

2017 Apr 19

[cfe-dev] FE_INEXACT being set for an exact conversion from float to unsigned long long

Changing the list from cfe-dev to llvm-dev > On 20 Apr 2017, at 4:52 AM, Michael Clark <michaeljclark at mac.com> wrote: > > I’m getting close. I think it may be an issue with an individual intrinsic. I’m looking for the X86 lowering of Instruction::FPToUI. > > I found a comment around the rationale for using a conditional move versus a branch. I believe the predicate logic

vctrs: a type system for the tidyverse

2018 Aug 06

vctrs: a type system for the tidyverse

Hi all, I wanted to share with you an experimental package that I?m currently working on: vctrs, <https://github.com/r-lib/vctrs>. The motivation for vctrs is to think deeply about the output ?type? of functions like `c()`, `ifelse()`, and `rbind()`, with an eye to implementing one strategy throughout the tidyverse (i.e. all the functions listed at

[PATCH] Make SSE Run Time option. Add Win32 SSE code

2004 Aug 06

[PATCH] Make SSE Run Time option. Add Win32 SSE code

All, Attached is a patch that does two things. First it makes the use of the current SSE code a run time option through the use of speex_decoder_ctl() and speex_encoder_ctl It does this twofold. First there is a modification to the configure.in script which introduces a check based upon platform. It will compile in the sse assembly if you are on an i?86 based platform by making a

[LLVMdev] Unexpected spilling of vector register during lane extraction on some x86_64 targets

2014 Oct 13

[LLVMdev] Unexpected spilling of vector register during lane extraction on some x86_64 targets

Hello, Depending on how I extract integer lanes from an x86_64 xmm register, the backend may spill that register in order to load scalars. The effect was observed on two targets: corei7-avx and btver1 (I haven't checked other targets). Here's a test case with spilling/no-spilling code put on conditional compile: #if __SSE4_1__ != 0 #include <smmintrin.h> #else #include

TypePromoteFloat loses intermediate rounding operations

2019 Dec 10

TypePromoteFloat loses intermediate rounding operations

For the following C code __fp16 x, y, z, w; void foo() { x = y + z; x = x + w; } clang produces IR that extends each operand to float and then truncates to half before assigning to x. Like this define dso_local void @foo() #0 !dbg !18 { %1 = load half, half* @y, align 2, !dbg !21 %2 = fpext half %1 to float, !dbg !21 %3 = load half, half* @z, align 2, !dbg !22 %4 = fpext half %3 to float, !dbg

[LLVMdev] "equivalent" .ll files diverge after optimizations are applied

2010 Aug 31

[LLVMdev] "equivalent" .ll files diverge after optimizations are applied

On Aug 31, 2010, at 1:21 PMPDT, Argyrios Kyrtzidis wrote: > > Just to be clear, are you saying that the fact that, after using llc > on the second IR, the produced asm is using MM registers, indicates > a bug ? Yes. It's not immediately obvious whether it's in the opt or llc, though. Chris was doing work involving <2 x float> and may know about this. >

[LLVMdev] x86-64 backend generates aligned ADDPS with unaligned address

2015 Jul 29

[LLVMdev] x86-64 backend generates aligned ADDPS with unaligned address

When I compile attached IR with LLVM 3.6 llc -march=x86-64 -o f.S f.ll it generates an aligned ADDPS with unaligned address. See attached f.S, here an extract: addq $12, %r9 # $12 is not a multiple of 4, thus for xmm0 this is unaligned xorl %esi, %esi .align 16, 0x90 .LBB0_1: # %loop2

similar to: [LLVMdev] Codegen for vector float->double cast fails on x86 above SSE3