Displaying 20 results from an estimated 1000 matches similar to: "[LLVMdev] Contants generation - proposal"
2013 Jun 25
0
[LLVMdev] Contants generation - proposal
Hi Elena,
> (2) Proposal
> Define one more Code Model, let's say "LargeNearConst", which will allow to put constants in .text.
Isn't that a little heavy-handed? The large model only requires the
less efficient access for symbols we can't control, and in fact x86
still uses pc-relative conditional branches within a function so it
can't pretend to support a single
2013 Jun 26
2
[LLVMdev] Contants generation - proposal
> I think that the improved behavior for consts should be acceptable in the large model. But that's just me.
By default, all constants should be in a special read-only section, and this section may be far from the text section.
I'm working with JIT model or with only one object file. The code model, when all constants are near, we can call "LargeJIT". (Is it sounds better
2013 Jun 25
0
[LLVMdev] Contants generation
That what I actually did now, locally in the code.
But I still see the " movabsq"
.text
.align 8, 0x90
.LCPI0_0:
.quad 4606281698874543309 # double 0.9
.LCPI0_1:
.quad 4631147119616759172 # double 42.2794408
.LCPI0_2:
.long 1065353216 # float 1
.zero 4
...
movabsq $.LCPI0_1, %rax # encoding: [0x48,0xb8,A,A,A,A,A,A,A,A]
2013 Jun 25
2
[LLVMdev] Contants generation
Hi again,
Actually, I've just been looking at the existing code and the ARM
solution may be over-complicated for this situation.
You should be able to override EmitConstantPool directly, or possibly
even just override getSectionForConstantKind in
X86LinuxTargetObjectFile (and perhaps others) to return .text.
Tim.
2013 Jun 26
0
[LLVMdev] Contants generation - proposal
>> I think that the improved behavior for consts should be acceptable in the large model. But that's just me.
> By default, all constants should be in a special read-only section, and this section may be far from the text section.
Why should they? The only reason I can think of is to support
execute-only pages, but isn't that the less common use-case? From what
I could tell from
2012 Mar 01
3
[LLVMdev] Stack alignment on X86 AVX seems incorrect
Hi Elena,
You're correct. LLVM does not align the stack to 32-bytes for AVX and
unaligned moves should be used for YMM spills.
I wrote some code to align the stack to 32-bytes when AVX spills are
present; it does break the x86-64 ABI though. If upstream would be
interested in this code, I can arrange with my employer to send a patch to
the mailing list.
-Cameron
On Mar 1, 2012, at 4:09 PM,
2013 Jun 24
2
[LLVMdev] Contants generation
Hi,
I'd like to generate constants inside .text in order to use ip-relative loads, when the code model is "large".
How can I do this?
(I'm on X86_64 linux)
Thank you.
- Elena
---------------------------------------------------------------------
Intel Israel (74) Limited
This e-mail and any attachments may contain confidential material for
the sole use of the
2012 Mar 01
0
[LLVMdev] Stack alignment on X86 AVX seems incorrect
./llc -mattr=+avx -stack-alignment=16 < basic.ll | grep movaps | grep ymm | grep rbp
vmovaps -176(%rbp), %ymm14
vmovaps -144(%rbp), %ymm11
vmovaps -240(%rbp), %ymm13
vmovaps -208(%rbp), %ymm9
vmovaps -272(%rbp), %ymm7
vmovaps -304(%rbp), %ymm0
vmovaps -112(%rbp), %ymm0
vmovaps -80(%rbp), %ymm1
vmovaps -112(%rbp), %ymm0
2012 Mar 01
0
[LLVMdev] Stack alignment on X86 AVX seems incorrect
Cameron,
Aligning the stack to 32 bytes when there are auto AVX vector variables
present shouldn't necessarily break the x86-64 ABI, as long as smaller
auto variables remain properly aligned. A similar approach was taken
for i386 in GCC in order to support SSE vectors.
Perhaps you could elaborate where the ABI was violated when your patch
is applied.
HTH
--
Evandro Menezes
2013 Jun 25
0
[LLVMdev] Contants generation
Hi Elena,
> I’d like to generate constants inside .text in order to use ip-relative
> loads, when the code model is “large”.
I don't think this is a sequence the x86 backend supports at the
moment, but it is how ARM handles its constant-pools. The outline is
that you have a pass which looks through a functions constpool uses
and emits a pseudo-instruction for each, which is then
2012 Jan 10
0
[LLVMdev] Calling conventions for YMM registers on AVX
This is the wrong code:
declare <16 x float> @foo(<16 x float>)
define <16 x float> @test(<16 x float> %x, <16 x float> %y) nounwind {
entry:
%x1 = fadd <16 x float> %x, %y
%call = call <16 x float> @foo(<16 x float> %x1) nounwind
%y1 = fsub <16 x float> %call, %y
ret <16 x float> %y1
}
./llc -mattr=+avx
2017 Jul 01
2
KNL Assembly Code for Matrix Multiplication
Thank You,
It means vmovdqa64 zmm22, zmmword ptr [rip + .LCPI0_0] # zmm22 =
[8,9,10,11,12,13,14,15] zmm22 will contain 64 bit constant values which are
indexes here zmm22=8, 9, 10, 11, 12,13,14,15. not the values loaded from
these locations. and zmm2 contains constant 4000. so,
vpmuludq zmm14, zmm10, zmm2 ; will multiply the indexes values with 4000,
as for array b the stride is 4000.
zmm14=
2013 Jul 10
4
[LLVMdev] unaligned AVX store gets split into two instructions
I'm seeing a difference in how LLVM 3.3 and 3.2 emit unaligned vector loads
on AVX.
3.3 is splitting up an unaligned vector load but in 3.2, it was emitted as
a single instruction (details below).
In a matrix-matrix inner-kernel, I see a ~25% decrease in performance,
which seems to be due to this.
Any ideas why this changed? Thanks!
Zach
LLVM Code:
define <4 x double> @vstore(<4 x
2013 Sep 05
1
[LLVMdev] AVX calling convention?
I am tracking down an x86-64 code generation problem that has to do with AVX instructions. The symptom is: a function is called, and the upper half of the function argument (which is short16) is zero. This happens only when I compile code with pocl, but not when I use clang and/or llc manually.
I tracked this down to the following. The call site looks like
vmovdqa 24064(%rsp), %ymm0
vmovdqa
2020 Sep 01
2
Vector evolution?
Hi,
Please consider the following loop:
using v4f32 = float __attribute__((__vector_size__(16)));
void fct6(v4f32 *x)
{
#pragma clang loop vectorize(enable)
for (int i = 0; i < 256; ++i)
x[i] = 7 * x[i];
}
After compiling it with:
clang++ -O3 -march=native -mtune=native \
-Rpass=loop-vectorize,slp-vectorize
-Rpass-missed=loop-vectorize,slp-vectorize
2012 Mar 14
2
[LLVMdev] How to set constant pool section?
Hi,
In the document: http://llvm.org/docs/WritingAnLLVMBackend.html
described example like:
SparcTargetAsmInfo::SparcTargetAsmInfo(const SparcTargetMachine &TM) {
Data16bitsDirective = "\t.half\t";
Data32bitsDirective = "\t.word\t";
Data64bitsDirective = 0; // .xword is only supported by V9.
ZeroDirective = "\t.skip\t";
CommentString = "!";
2013 Dec 12
0
[LLVMdev] AVX code gen
It probably does not pick the right processor architecture.
You could try “clang -mavx” or “clang -march=corei7-avx” for ivy-bridge and “clang -march=core-avx2” or “clang -mavx2" for haswell.
$ clang -march=core-avx2 -O3 -S -o - test.c
.section __TEXT,__text,regular,pure_instructions
.globl _f
.align 4, 0x90
_f: ## @f
2019 Sep 02
3
AVX2 codegen - question reg. FMA generation
Hello,
On the appended reasonably simple test case that has an fmul/fadd
sequence on <8 x float> vector types, I don't see the x86-64 code
generator (with cpu set to haswell or later types) turning it into an
AVX2 FMA instructions. Here's the snippet in the output it generates:
$ llc -O3 -mcpu=skylake
---------------------
.LBB0_2: # =>This Inner
2012 May 24
0
[LLVMdev] use AVX automatically if present
On Thu, 24 May 2012, Pan, Wei wrote:
> Very likely AVX is not enabled in your llc. This feature was enabled
> just recently (late of April).
I forgot to mention that I am using recent LLVM-3.1 and in principle my
llc knows about avx as I have shown in the second example. But avx does
not seem to be used by default.
On Thu, 24 May 2012, Henning Thielemann wrote:
> $ llc -o - -mattr
2019 Sep 02
2
AVX2 codegen - question reg. FMA generation
On Mon, 2 Sep 2019 at 16:59, Roman Lebedev <lebedev.ri at gmail.com> wrote:
>
> It appears you need 'reassoc' on fmul/fadd:
> https://godbolt.org/z/nuTzx2
Thanks very much, that was it. Either that or providing
-enable-unsafe-fp-math to llc yielded FMAs. I didn't expect this since
using FMAs here instead of mul/add appears to be safer (the reverse is
unsafe).
~ Uday