similar to: [LLVMdev] Codegen for vector float->double cast fails on x86 above SSE3

Displaying 20 results from an estimated 2000 matches similar to: "[LLVMdev] Codegen for vector float->double cast fails on x86 above SSE3"

2013 Jul 19
0
[LLVMdev] llvm.x86.sse2.sqrt.pd not using sqrtpd, calling a function that modifies ECX
(Changing subject line as diagnosis has changed) I'm attaching the compiled code that I've been getting, both with CodeGenOpt::Default and CodeGenOpt::None . The crash isn't occurring with CodeGenOpt::None, but that seems to be because ECX isn't being used - it still gets set to 0x7fffffff by one of the calls to 76719BA1 I notice that X86::SQRTPD[m|r] appear in
2013 Jul 23
0
[LLVMdev] Enabling the SLP vectorizer by default for -O3
Hi, Sorry for the delay in response. I measured the code size change and noticed small changes in both directions for individual programs. I found a 30k binary size growth for the entire testsuite + SPEC. I attached an updated performance report that includes both compile time and performance measurements. Thanks, Nadav On Jul 14, 2013, at 10:55 PM, Nadav Rotem <nrotem at apple.com>
2013 Jul 15
3
[LLVMdev] Enabling the SLP vectorizer by default for -O3
On Jul 14, 2013, at 9:52 PM, Chris Lattner <clattner at apple.com> wrote: > > On Jul 13, 2013, at 11:30 PM, Nadav Rotem <nrotem at apple.com> wrote: > >> Hi, >> >> LLVM’s SLP-vectorizer is a new pass that combines similar independent instructions in a straight-line code. It is currently not enabled by default, and people who want to experiment with it
2016 Jun 29
0
avx512 JIT backend generates wrong code on <4 x float>
Hi Frank, I recommend trying trunk LLVM. AVX-512 development has been very active recently. -Hal ----- Original Message ----- > From: "Frank Winter via llvm-dev" <llvm-dev at lists.llvm.org> > To: "LLVM Dev" <llvm-dev at lists.llvm.org> > Sent: Wednesday, June 29, 2016 2:41:39 PM > Subject: [llvm-dev] avx512 JIT backend generates wrong code on <4
2016 Jun 30
1
avx512 JIT backend generates wrong code on <4 x float>
Hi Hal! Thanks, but unfortunately it didn't help. The exact same assembler instructions are generated for both 3.8 (yesterday) and trunk (from today). So, this really looks like a bug. Best, Frank On 06/29/2016 03:48 PM, Hal Finkel wrote: > Hi Frank, > > I recommend trying trunk LLVM. AVX-512 development has been very active recently. > > -Hal > > ----- Original
2016 Jun 29
2
avx512 JIT backend generates wrong code on <4 x float>
Hi! When compiling the attached module with the JIT engine on an Intel KNL I see wrong code getting emitted. I attach a complete exploit program which shows the bug in LLVM 3.8. It loads and JIT compiles the module and prints the assembler. I stumbled on this since the result of an actual calculation was wrong. So, it's not only the text version of the assembler also the machine
2011 Mar 01
0
[LLVMdev] Use of movupd instead of movapd for x86
On Feb 28, 2011, at 2:58 AM, Sebastien DELDON-GNB wrote: > Understood for the aligned case, I want to measure performance degradation for unaligned case. > I mean unaligned case versus aligned. I know this is stupid, but I want to try to pass a <4 x float>* as parameter of a routine and at the call site I want to pass a misaligned pointer. Since LLVM is generating movapd instruction
2011 Feb 28
2
[LLVMdev] Use of movupd instead of movapd for x86
Understood for the aligned case, I want to measure performance degradation for unaligned case. I mean unaligned case versus aligned. I know this is stupid, but I want to try to pass a <4 x float>* as parameter of a routine and at the call site I want to pass a misaligned pointer. Since LLVM is generating movapd instruction it will raise an exception (SEGFAULT), I just want to know if there
2011 Feb 25
0
[LLVMdev] Use of movupd instead of movapd for x86
Sebastien DELDON-GNB <sebastien.deldon at st.com> writes: > Hi all, > > Is there a way to force llc to generate movupd instruction instead of movapd for x86 target ? > > I know that movapd is more performant, but I would like to measure degradation when alignment constraints are not met. On modern processors a movupd on aligned data is going to be indistinguishable in
2006 Oct 31
0
6327969 cpuid sse3 feature bit not noted on any AMD processor
Author: dmick Repository: /hg/zfs-crypto/gate Revision: 4559d499327b38dfc599c113d49e339f5c0308c3 Log message: 6327969 cpuid sse3 feature bit not noted on any AMD processor Files: update: usr/src/uts/i86pc/os/cpuid.c update: usr/src/uts/intel/sys/x86_archext.h
2018 Aug 06
0
vctrs: a type system for the tidyverse
Hadley, Looks interesting and like a fun project from what you said in the email (I don't have time right now to dig deep into the readme) A few thoughts. First off, you are using the word "type" throughout this email; You seem to mean class (judging by your Date and factor examples, and the fact you mention S3 dispatch) as opposed to type in the sense of what is returned by
2013 May 06
1
[LLVMdev] Help with setting up a software-float supported architecture target
For a LLVM code target, I am developing a software-supported floating-point engine. I've come into a few issues in which the DAG conversion is Can anyone help with: I have some C code test like: float x; printf("%f", x); which clang coverts to: %0 = load float* @x, align 4 %conv = fpext float %0 to double The issue is for simple Float to Double extension, the DAGCombiner.cpp
2011 Feb 25
3
[LLVMdev] Use of movupd instead of movapd for x86
Hi all, Is there a way to force llc to generate movupd instruction instead of movapd for x86 target ? I know that movapd is more performant, but I would like to measure degradation when alignment constraints are not met. Best Regards Seb -------------- next part -------------- An HTML attachment was scrubbed... URL:
2017 Apr 19
3
[cfe-dev] FE_INEXACT being set for an exact conversion from float to unsigned long long
Changing the list from cfe-dev to llvm-dev > On 20 Apr 2017, at 4:52 AM, Michael Clark <michaeljclark at mac.com> wrote: > > I’m getting close. I think it may be an issue with an individual intrinsic. I’m looking for the X86 lowering of Instruction::FPToUI. > > I found a comment around the rationale for using a conditional move versus a branch. I believe the predicate logic
2018 Aug 06
2
vctrs: a type system for the tidyverse
Hi all, I wanted to share with you an experimental package that I?m currently working on: vctrs, <https://github.com/r-lib/vctrs>. The motivation for vctrs is to think deeply about the output ?type? of functions like `c()`, `ifelse()`, and `rbind()`, with an eye to implementing one strategy throughout the tidyverse (i.e. all the functions listed at
2004 Aug 06
2
[PATCH] Make SSE Run Time option. Add Win32 SSE code
All, Attached is a patch that does two things. First it makes the use of the current SSE code a run time option through the use of speex_decoder_ctl() and speex_encoder_ctl It does this twofold. First there is a modification to the configure.in script which introduces a check based upon platform. It will compile in the sse assembly if you are on an i?86 based platform by making a
2014 Oct 13
2
[LLVMdev] Unexpected spilling of vector register during lane extraction on some x86_64 targets
Hello, Depending on how I extract integer lanes from an x86_64 xmm register, the backend may spill that register in order to load scalars. The effect was observed on two targets: corei7-avx and btver1 (I haven't checked other targets). Here's a test case with spilling/no-spilling code put on conditional compile: #if __SSE4_1__ != 0 #include <smmintrin.h> #else #include
2019 Dec 10
2
TypePromoteFloat loses intermediate rounding operations
For the following C code __fp16 x, y, z, w; void foo() { x = y + z; x = x + w; } clang produces IR that extends each operand to float and then truncates to half before assigning to x. Like this define dso_local void @foo() #0 !dbg !18 { %1 = load half, half* @y, align 2, !dbg !21 %2 = fpext half %1 to float, !dbg !21 %3 = load half, half* @z, align 2, !dbg !22 %4 = fpext half %3 to float, !dbg
2010 Aug 31
0
[LLVMdev] "equivalent" .ll files diverge after optimizations are applied
On Aug 31, 2010, at 1:21 PMPDT, Argyrios Kyrtzidis wrote: > > Just to be clear, are you saying that the fact that, after using llc > on the second IR, the produced asm is using MM registers, indicates > a bug ? Yes. It's not immediately obvious whether it's in the opt or llc, though. Chris was doing work involving <2 x float> and may know about this. >
2015 Jul 29
2
[LLVMdev] x86-64 backend generates aligned ADDPS with unaligned address
When I compile attached IR with LLVM 3.6 llc -march=x86-64 -o f.S f.ll it generates an aligned ADDPS with unaligned address. See attached f.S, here an extract: addq $12, %r9 # $12 is not a multiple of 4, thus for xmm0 this is unaligned xorl %esi, %esi .align 16, 0x90 .LBB0_1: # %loop2