thr3ads.net - similar to: "[LLVMdev] Contants generation

Displaying 20 results from an estimated 1000 matches similar to: "[LLVMdev] Contants generation - proposal"

[LLVMdev] Contants generation - proposal

2013 Jun 25

[LLVMdev] Contants generation - proposal

Hi Elena, > (2) Proposal > Define one more Code Model, let's say "LargeNearConst", which will allow to put constants in .text. Isn't that a little heavy-handed? The large model only requires the less efficient access for symbols we can't control, and in fact x86 still uses pc-relative conditional branches within a function so it can't pretend to support a single

[LLVMdev] Contants generation - proposal

2013 Jun 26

[LLVMdev] Contants generation - proposal

> I think that the improved behavior for consts should be acceptable in the large model. But that's just me. By default, all constants should be in a special read-only section, and this section may be far from the text section. I'm working with JIT model or with only one object file. The code model, when all constants are near, we can call "LargeJIT". (Is it sounds better

[LLVMdev] Contants generation

2013 Jun 25

[LLVMdev] Contants generation

That what I actually did now, locally in the code. But I still see the " movabsq" .text .align 8, 0x90 .LCPI0_0: .quad 4606281698874543309 # double 0.9 .LCPI0_1: .quad 4631147119616759172 # double 42.2794408 .LCPI0_2: .long 1065353216 # float 1 .zero 4 ... movabsq $.LCPI0_1, %rax # encoding: [0x48,0xb8,A,A,A,A,A,A,A,A]

[LLVMdev] Contants generation

2013 Jun 25

[LLVMdev] Contants generation

Hi again, Actually, I've just been looking at the existing code and the ARM solution may be over-complicated for this situation. You should be able to override EmitConstantPool directly, or possibly even just override getSectionForConstantKind in X86LinuxTargetObjectFile (and perhaps others) to return .text. Tim.

[LLVMdev] Contants generation - proposal

2013 Jun 26

[LLVMdev] Contants generation - proposal

>> I think that the improved behavior for consts should be acceptable in the large model. But that's just me. > By default, all constants should be in a special read-only section, and this section may be far from the text section. Why should they? The only reason I can think of is to support execute-only pages, but isn't that the less common use-case? From what I could tell from

[LLVMdev] Stack alignment on X86 AVX seems incorrect

2012 Mar 01

[LLVMdev] Stack alignment on X86 AVX seems incorrect

Hi Elena, You're correct. LLVM does not align the stack to 32-bytes for AVX and unaligned moves should be used for YMM spills. I wrote some code to align the stack to 32-bytes when AVX spills are present; it does break the x86-64 ABI though. If upstream would be interested in this code, I can arrange with my employer to send a patch to the mailing list. -Cameron On Mar 1, 2012, at 4:09 PM,

[LLVMdev] Contants generation

2013 Jun 24

[LLVMdev] Contants generation

Hi, I'd like to generate constants inside .text in order to use ip-relative loads, when the code model is "large". How can I do this? (I'm on X86_64 linux) Thank you. - Elena --------------------------------------------------------------------- Intel Israel (74) Limited This e-mail and any attachments may contain confidential material for the sole use of the

[LLVMdev] Stack alignment on X86 AVX seems incorrect

2012 Mar 01

[LLVMdev] Stack alignment on X86 AVX seems incorrect

./llc -mattr=+avx -stack-alignment=16 < basic.ll | grep movaps | grep ymm | grep rbp vmovaps -176(%rbp), %ymm14 vmovaps -144(%rbp), %ymm11 vmovaps -240(%rbp), %ymm13 vmovaps -208(%rbp), %ymm9 vmovaps -272(%rbp), %ymm7 vmovaps -304(%rbp), %ymm0 vmovaps -112(%rbp), %ymm0 vmovaps -80(%rbp), %ymm1 vmovaps -112(%rbp), %ymm0

[LLVMdev] Stack alignment on X86 AVX seems incorrect

2012 Mar 01

[LLVMdev] Stack alignment on X86 AVX seems incorrect

Cameron, Aligning the stack to 32 bytes when there are auto AVX vector variables present shouldn't necessarily break the x86-64 ABI, as long as smaller auto variables remain properly aligned. A similar approach was taken for i386 in GCC in order to support SSE vectors. Perhaps you could elaborate where the ABI was violated when your patch is applied. HTH -- Evandro Menezes

[LLVMdev] Contants generation

2013 Jun 25

[LLVMdev] Contants generation

Hi Elena, > I’d like to generate constants inside .text in order to use ip-relative > loads, when the code model is “large”. I don't think this is a sequence the x86 backend supports at the moment, but it is how ARM handles its constant-pools. The outline is that you have a pass which looks through a functions constpool uses and emits a pseudo-instruction for each, which is then

[LLVMdev] Calling conventions for YMM registers on AVX

2012 Jan 10

[LLVMdev] Calling conventions for YMM registers on AVX

This is the wrong code: declare <16 x float> @foo(<16 x float>) define <16 x float> @test(<16 x float> %x, <16 x float> %y) nounwind { entry: %x1 = fadd <16 x float> %x, %y %call = call <16 x float> @foo(<16 x float> %x1) nounwind %y1 = fsub <16 x float> %call, %y ret <16 x float> %y1 } ./llc -mattr=+avx

KNL Assembly Code for Matrix Multiplication

2017 Jul 01

KNL Assembly Code for Matrix Multiplication

Thank You, It means vmovdqa64 zmm22, zmmword ptr [rip + .LCPI0_0] # zmm22 = [8,9,10,11,12,13,14,15] zmm22 will contain 64 bit constant values which are indexes here zmm22=8, 9, 10, 11, 12,13,14,15. not the values loaded from these locations. and zmm2 contains constant 4000. so, vpmuludq zmm14, zmm10, zmm2 ; will multiply the indexes values with 4000, as for array b the stride is 4000. zmm14=

[LLVMdev] unaligned AVX store gets split into two instructions

2013 Jul 10

[LLVMdev] unaligned AVX store gets split into two instructions

I'm seeing a difference in how LLVM 3.3 and 3.2 emit unaligned vector loads on AVX. 3.3 is splitting up an unaligned vector load but in 3.2, it was emitted as a single instruction (details below). In a matrix-matrix inner-kernel, I see a ~25% decrease in performance, which seems to be due to this. Any ideas why this changed? Thanks! Zach LLVM Code: define <4 x double> @vstore(<4 x

[LLVMdev] AVX calling convention?

2013 Sep 05

[LLVMdev] AVX calling convention?

I am tracking down an x86-64 code generation problem that has to do with AVX instructions. The symptom is: a function is called, and the upper half of the function argument (which is short16) is zero. This happens only when I compile code with pocl, but not when I use clang and/or llc manually. I tracked this down to the following. The call site looks like vmovdqa 24064(%rsp), %ymm0 vmovdqa

Vector evolution?

2020 Sep 01

Vector evolution?

Hi, Please consider the following loop: using v4f32 = float __attribute__((__vector_size__(16))); void fct6(v4f32 *x) { #pragma clang loop vectorize(enable) for (int i = 0; i < 256; ++i) x[i] = 7 * x[i]; } After compiling it with: clang++ -O3 -march=native -mtune=native \ -Rpass=loop-vectorize,slp-vectorize -Rpass-missed=loop-vectorize,slp-vectorize

[LLVMdev] How to set constant pool section?

2012 Mar 14

[LLVMdev] How to set constant pool section?

Hi, In the document: http://llvm.org/docs/WritingAnLLVMBackend.html described example like: SparcTargetAsmInfo::SparcTargetAsmInfo(const SparcTargetMachine &TM) { Data16bitsDirective = "\t.half\t"; Data32bitsDirective = "\t.word\t"; Data64bitsDirective = 0; // .xword is only supported by V9. ZeroDirective = "\t.skip\t"; CommentString = "!";

[LLVMdev] AVX code gen

2013 Dec 12

[LLVMdev] AVX code gen

It probably does not pick the right processor architecture. You could try “clang -mavx” or “clang -march=corei7-avx” for ivy-bridge and “clang -march=core-avx2” or “clang -mavx2" for haswell. $ clang -march=core-avx2 -O3 -S -o - test.c .section __TEXT,__text,regular,pure_instructions .globl _f .align 4, 0x90 _f: ## @f

AVX2 codegen - question reg. FMA generation

2019 Sep 02

AVX2 codegen - question reg. FMA generation

Hello, On the appended reasonably simple test case that has an fmul/fadd sequence on <8 x float> vector types, I don't see the x86-64 code generator (with cpu set to haswell or later types) turning it into an AVX2 FMA instructions. Here's the snippet in the output it generates: $ llc -O3 -mcpu=skylake --------------------- .LBB0_2: # =>This Inner

[LLVMdev] use AVX automatically if present

2012 May 24

[LLVMdev] use AVX automatically if present

On Thu, 24 May 2012, Pan, Wei wrote: > Very likely AVX is not enabled in your llc. This feature was enabled > just recently (late of April). I forgot to mention that I am using recent LLVM-3.1 and in principle my llc knows about avx as I have shown in the second example. But avx does not seem to be used by default. On Thu, 24 May 2012, Henning Thielemann wrote: > $ llc -o - -mattr

AVX2 codegen - question reg. FMA generation

2019 Sep 02

AVX2 codegen - question reg. FMA generation

On Mon, 2 Sep 2019 at 16:59, Roman Lebedev <lebedev.ri at gmail.com> wrote: > > It appears you need 'reassoc' on fmul/fadd: > https://godbolt.org/z/nuTzx2 Thanks very much, that was it. Either that or providing -enable-unsafe-fp-math to llc yielded FMAs. I didn't expect this since using FMAs here instead of mul/add appears to be safer (the reverse is unsafe). ~ Uday

similar to: [LLVMdev] Contants generation - proposal