thr3ads.net - similar to: "[LLVMdev] Which floating-point comparison?"

Displaying 20 results from an estimated 4000 matches similar to: "[LLVMdev] Which floating-point comparison?"

[LLVMdev] Which floating-point comparison?

2010 Mar 28

[LLVMdev] Which floating-point comparison?

On Sun, Mar 28, 2010 at 7:45 AM, Russell Wallace <russell.wallace at gmail.com> wrote: > I notice llvm provides both ordered and unordered variants of > floating-point comparison. Which of these is the right one to use by > default? I suppose the two criteria would be, in order of importance: > > 1. Which is more efficient (more directly maps to typical hardware)? You can

[Codegen bug in LLVM 3.8?] br following `fcmp une` is present in ll, absent in asm

2017 Mar 01

[Codegen bug in LLVM 3.8?] br following `fcmp une` is present in ll, absent in asm

Hi, We seem to have found a bug in the LLVM 3.8 code generator. We are using MCJIT and have isolated working.ll and broken.ll after middle-end optimizations -- in the block merge128, notice that broken.ll has a fcmp une comparison to zero and a jump based on that branch: merge128: ; preds = %true71, %false72 %_rtB_724 = load %B_repro_T*, %B_repro_T**

[LLVMdev] x86 REP-prefixed instructions seem to be dropped by instruction decoder?

2012 Aug 13

[LLVMdev] x86 REP-prefixed instructions seem to be dropped by instruction decoder?

I think there's a bug somewhere in TableGen for the X86 disassembler emitter. The following test: $ echo "0xF3 0xA5" | ./bin/llvm-mc -disassemble .section __TEXT,__text,regular,pure_instructions movsd (from llvm trunk) 0xF3 is the REP prefix, so the printed instruction should be 'rep movsd', however all that is printed is 'movsd'. It seems that there

[LLVMdev] SIMD instructions and memory alignment on X86

2013 Jul 19

[LLVMdev] SIMD instructions and memory alignment on X86

Hmm, I'm not able to get those .ll files to compile if I disable SSE and I end up with SSE instructions(including sqrtpd) if I don't disable it. On Thu, Jul 18, 2013 at 10:53 PM, Peter Newman <peter at uformia.com> wrote: > Is there something specifically required to enable SSE? If it's not > detected as available (based from the target triple?) then I don't think

[LLVMdev] x86 REP-prefixed instructions seem to be dropped by instruction decoder?

2012 Aug 14

[LLVMdev] x86 REP-prefixed instructions seem to be dropped by instruction decoder?

On 13 August 2012 12:02, Andrew Ruef <awruef at umd.edu> wrote: > I think there's a bug somewhere in TableGen for the X86 disassembler > emitter. The following test: > > $ echo "0xF3 0xA5" | ./bin/llvm-mc -disassemble > .section __TEXT,__text,regular,pure_instructions > movsd > > (from llvm trunk) > > 0xF3 is the REP prefix, so the

[LLVMdev] unaligned AVX store gets split into two instructions

2013 Jul 10

[LLVMdev] unaligned AVX store gets split into two instructions

I'm seeing a difference in how LLVM 3.3 and 3.2 emit unaligned vector loads on AVX. 3.3 is splitting up an unaligned vector load but in 3.2, it was emitted as a single instruction (details below). In a matrix-matrix inner-kernel, I see a ~25% decrease in performance, which seems to be due to this. Any ideas why this changed? Thanks! Zach LLVM Code: define <4 x double> @vstore(<4 x

[LLVMdev] "equivalent" .ll files diverge after optimizations are applied

2010 Aug 31

[LLVMdev] "equivalent" .ll files diverge after optimizations are applied

Hi, I've attached 2 .ll files which are supposed to be equivalent but 'unopt-fail.ll' causes a crash in webkit's test suite while 'unopt-pass.ll' does not. I can't give more details about the crash, when I run the crashing test it in isolation it passes, when I run the full suite it crashes; it boggles the mind. Below I provide the optimized asm that is produced from

[LLVMdev] llvm_fcmp_ord and llvm_fcmp_uno and assembly code generation

2007 Oct 19

[LLVMdev] llvm_fcmp_ord and llvm_fcmp_uno and assembly code generation

Hi, The C backend in llc generates code like: static inline int llvm_fcmp_ord(double X, double Y) { return X == X && Y == Y; } static inline int llvm_fcmp_uno(double X, double Y) { return X != X || Y != Y; } First of all it generates a warning by clang and gcc (with certain flags): x.cbe.c:130: warning: comparing floating point with == or != is unsafe Now, C99 provides a macro for this

[LLVMdev] "equivalent" .ll files diverge after optimizations are applied

2010 Aug 31

[LLVMdev] "equivalent" .ll files diverge after optimizations are applied

Using MM registers is wrong unless the user has specifically asked for it, which doesn't seem to be the case here. In the awesome MMX architecture, touching an MM register makes subsequent x87 operations fail unless an EMMS instruction is issued first; none of the compilers here are smart enough to insert EMMS instructions in the right places, so the only safe thing is not to use

[LLVMdev] "equivalent" .ll files diverge after optimizations are applied

2010 Aug 31

[LLVMdev] "equivalent" .ll files diverge after optimizations are applied

Here's the optimized versions: $ opt -std-compile-opts unopt-pass.ll -o - | llvm-dis -o - [...] define %3 @_ZN7WebCore15GraphicsContext19roundToDevicePixelsERKNS_9FloatRectE(%"class.WebCore::GraphicsContext"* %this, %"struct.WebCore::FloatRect"* %rect) nounwind ssp align 2 { %roundedOrigin = alloca %"class.WebCore::FloatSize", align 4 ;

[LLVMdev] Suboptimal code due to excessive spilling

2012 Mar 28

[LLVMdev] Suboptimal code due to excessive spilling

Hi, I have run into the following strange behavior and wanted to ask for some advice. For the C program below, function sum() gets inlined in foo() but the code generated looks very suboptimal (the code is an extract from a larger program). Below I show the 32-bit x86 assembly as produced by the demo page on the llvm home page ("Output A"). As you can see from the assembly, after

unnecessary reload of 8-byte struct on i386

2019 Oct 25

unnecessary reload of 8-byte struct on i386

Hello folks, I've recently been looking at the generated code for a few functions in Chromium while investigating crashes, and I came across a curious pattern. A smallish repro case is available at https://godbolt.org/z/Dsu1WI . In that case, the function Assembler::emit_arith receives a struct (Operand) by value and passes it by value to another function. That struct is 8 bytes long, so the

[LLVMdev] Suboptimal code due to excessive spilling

2012 Apr 05

[LLVMdev] Suboptimal code due to excessive spilling

I don't know much about this, but maybe -mllvm -unroll-count=1 can be used as a workaround? /Patrik Hägglund -----Original Message----- From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Brent Walker Sent: den 28 mars 2012 03:18 To: llvmdev Subject: [LLVMdev] Suboptimal code due to excessive spilling Hi, I have run into the following strange behavior

[LLVMdev] Odd weak symbol thing on i386

2012 Jan 13

[LLVMdev] Odd weak symbol thing on i386

Hi, I'm compiling lldiv.c from the NetBSD standard library. It works on ARM, Mips, Microblaze,ppc, ppc64, and x86_64. On i386 a very strange thing happens. Here's the source: #include <stdlib.h> #define __weak_alias(sym) __attribute__ ((weak, alias (#sym))) lldiv_t lldiv(long long int num, long long int denom) __weak_alias(_lldiv); lldiv_t _lldiv(long long num, long

[LLVMdev] equivalent IR, different asm

2010 Sep 01

[LLVMdev] equivalent IR, different asm

The attached .ll files seem equivalent, but the resulting asm from 'opt-fail.ll' causes a crash to webkit. I suspect the usage of registers is wrong, can someone take a look ? $ llc opt-pass.ll -o - .section __TEXT,__text,regular,pure_instructions .globl __ZN7WebCore6kolos1ERiS0_PKNS_20RenderBoxModelObjectEPNS_10StyleImageE .align 4, 0x90

[LLVMdev] equivalent IR, different asm

2010 Sep 01

[LLVMdev] equivalent IR, different asm

On Sep 1, 2010, at 6:25 AM, Argyrios Kyrtzidis wrote: > The attached .ll files seem equivalent, but the resulting asm from 'opt-fail.ll' causes a crash to webkit. > I suspect the usage of registers is wrong, can someone take a look ? The difference is that there is a shift right after the multiply, before the divide. In IR, the difference is: %5 = mul nsw i32 %4, %tmp1

[LLVMdev] Incorrect code generated for arm64

2015 May 04

[LLVMdev] Incorrect code generated for arm64

Hi all, I’ve narrowed down a problem in my code to the following test case: - - - - typedef struct {float v[2];} vec2; typedef struct {float v[3];} vec3; vec2 getVec2(); vec3 getVec3() { vec2 myVec = getVec2(); vec3 res; res.v[0] = myVec.v[0]; res.v[1] = myVec.v[1]; res.v[2] = 1; return res; } - - - - Compiling this with any level of optimization for arm64 gives incorrect code,

[RFC][llvm-mca] Adding binary support to llvm-mca.

2018 Nov 15

[RFC][llvm-mca] Adding binary support to llvm-mca.

Introduction ----------------- Currently llvm-mca only accepts assembly code as input. We would like to extend llvm-mca to support object files, allowing users to analyze the performance of binaries. The proposed changes (which involve both clang and llvm) optionally introduce an object file section, but this can be stripped-out if desired. For the llvm-mca binary support feature to be useful, a

[LLVMdev] how to annotate assembler

2012 Mar 02

[LLVMdev] how to annotate assembler

Hi, In GCC there is one useful option -dp (or -dP for more verbose output) to annotate assembler with instruction patterns, that was used when assembler was generated. For example: double test(long long s) { return s; } gcc -S -dp -O0 test.c test: .LFB0: .cfi_startproc pushq %rbp # 18 *pushdi2_rex64/1 [length = 1] .cfi_def_cfa_offset 16 movq %rsp, %rbp # 19 *movdi_1_rex64/2

[LLVMdev] instruction scheduling issue

2013 Jan 04

[LLVMdev] instruction scheduling issue

Hi all, I'm trying to insert a function call "llvm_memory_profiling " right before each memory access. The function uses the effective address of the memory access as its single parameter. A example is as follows: the function call at 402a99 has a parameter passed to %rdi at 402a91. One can see that the function call is exactly before the memory access I want to monitor because

similar to: [LLVMdev] Which floating-point comparison?