Displaying 20 results from an estimated 2000 matches similar to: "[LLVMdev] LLVMdev Digest, Vol 93, Issue 3"
2012 Mar 01
3
[LLVMdev] Stack alignment on X86 AVX seems incorrect
Even if you explicitly specify –stack-alignment=16 the aligned movs are still generated.
It is not an issue related to ABI.
See my original mail:
./llc -mattr=+avx -stack-alignment=16 < basic.ll | grep movaps | grep ymm | grep rbp
vmovaps -176(%rbp), %ymm14
vmovaps -144(%rbp), %ymm11
vmovaps -240(%rbp), %ymm13
- Elena
From: Cameron McInally
2012 Mar 01
0
[LLVMdev] Stack alignment on X86 AVX seems incorrect
./llc -mattr=+avx -stack-alignment=16 < basic.ll | grep movaps | grep ymm | grep rbp
vmovaps -176(%rbp), %ymm14
vmovaps -144(%rbp), %ymm11
vmovaps -240(%rbp), %ymm13
vmovaps -208(%rbp), %ymm9
vmovaps -272(%rbp), %ymm7
vmovaps -304(%rbp), %ymm0
vmovaps -112(%rbp), %ymm0
vmovaps -80(%rbp), %ymm1
vmovaps -112(%rbp), %ymm0
2012 Mar 01
3
[LLVMdev] Stack alignment on X86 AVX seems incorrect
Hi Elena,
You're correct. LLVM does not align the stack to 32-bytes for AVX and
unaligned moves should be used for YMM spills.
I wrote some code to align the stack to 32-bytes when AVX spills are
present; it does break the x86-64 ABI though. If upstream would be
interested in this code, I can arrange with my employer to send a patch to
the mailing list.
-Cameron
On Mar 1, 2012, at 4:09 PM,
2012 Mar 01
0
[LLVMdev] Stack alignment on X86 AVX seems incorrect
Cameron,
Aligning the stack to 32 bytes when there are auto AVX vector variables
present shouldn't necessarily break the x86-64 ABI, as long as smaller
auto variables remain properly aligned. A similar approach was taken
for i386 in GCC in order to support SSE vectors.
Perhaps you could elaborate where the ABI was violated when your patch
is applied.
HTH
--
Evandro Menezes
2012 Jan 10
0
[LLVMdev] Calling conventions for YMM registers on AVX
This is the wrong code:
declare <16 x float> @foo(<16 x float>)
define <16 x float> @test(<16 x float> %x, <16 x float> %y) nounwind {
entry:
%x1 = fadd <16 x float> %x, %y
%call = call <16 x float> @foo(<16 x float> %x1) nounwind
%y1 = fsub <16 x float> %call, %y
ret <16 x float> %y1
}
./llc -mattr=+avx
2012 Mar 01
0
[LLVMdev] Stack alignment on X86 AVX seems incorrect
When stack is unaligned, LLVM should generate vmovups instead of vmovaps.
- Elena
-----Original Message-----
From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Joerg Sonnenberger
Sent: Thursday, March 01, 2012 20:31
To: llvmdev at cs.uiuc.edu
Subject: Re: [LLVMdev] Stack alignment on X86 AVX seems incorrect
On Thu, Mar 01, 2012 at 06:16:46PM +0000,
2012 Jul 27
3
[LLVMdev] X86 FMA4
> It appears that the stats you listed are for movaps [SSE], not vmovaps [AVX]. I would *assume* that vmovaps(m128) is closer to vmovaps(m256), since they are both AVX instructions. Although, yes, I agree that this is not clear from Agner's report. Please correct me if I am misunderstanding.
You are misunderstanding [no worries, happens to everyone = )]. The timings I listed were for
2018 Apr 17
0
How to create and insert a call MachineInstr?
Hi Tim,
I'm sorry to bother you again. Since I have met the problem, how to check
used registers and avoid clobbering live registers, which you mentioned in
the email.
I am working in the function X86InstrInfo::storeRegToStackSlot, which is in
lib/Target/X86/X86InstrInfo.cpp.
And I have an extra problem, may I use MOV64mr and two addReg to set two
registers as its arguments? I want to use
2012 Mar 01
2
[LLVMdev] Stack alignment on X86 AVX seems incorrect
On Thu, Mar 01, 2012 at 06:16:46PM +0000, Demikhovsky, Elena wrote:
> vmovaps should not access stack if it is not aligned to 32
I'm not completely sure I understand your problem. Are you saying that
the generated code assumes 256bit alignment, your default stack
alignment is 128bit and LLVM doesn't adjust it automatically?
Joerg
2008 Apr 16
0
[LLVMdev] Being able to know the jitted code-size before emitting
Comments below.
On Apr 15, 2008, at 4:24 AM, Nicolas Geoffray wrote:
> OK, here's a new patch that adds the infrastructure and the
> implementation for X86, ARM and PPC of GetInstSize and
> GetFunctionSize. Both functions are virtual functions defined in
> TargetInstrInfo.h.
>
> For X86, I moved some commodity functions from X86CodeEmitter to
> X86InstrInfo.
>
2012 Jul 27
0
[LLVMdev] X86 FMA4
Hey Michael,
Thanks for the legwork!
It appears that the stats you listed are for movaps [SSE], not vmovaps
[AVX]. I would *assume* that vmovaps(m128) is closer to vmovaps(m256),
since they are both AVX instructions. Although, yes, I agree that this is
not clear from Agner's report. Please correct me if I am misunderstanding.
As I am sure you are aware, we cannot use SSE (movaps)
2020 Sep 01
2
Vector evolution?
Hi,
Please consider the following loop:
using v4f32 = float __attribute__((__vector_size__(16)));
void fct6(v4f32 *x)
{
#pragma clang loop vectorize(enable)
for (int i = 0; i < 256; ++i)
x[i] = 7 * x[i];
}
After compiling it with:
clang++ -O3 -march=native -mtune=native \
-Rpass=loop-vectorize,slp-vectorize
-Rpass-missed=loop-vectorize,slp-vectorize
2016 Jun 29
0
avx512 JIT backend generates wrong code on <4 x float>
Hi Frank,
I recommend trying trunk LLVM. AVX-512 development has been very active recently.
-Hal
----- Original Message -----
> From: "Frank Winter via llvm-dev" <llvm-dev at lists.llvm.org>
> To: "LLVM Dev" <llvm-dev at lists.llvm.org>
> Sent: Wednesday, June 29, 2016 2:41:39 PM
> Subject: [llvm-dev] avx512 JIT backend generates wrong code on <4
2016 Jun 30
1
avx512 JIT backend generates wrong code on <4 x float>
Hi Hal!
Thanks, but unfortunately it didn't help. The exact same assembler
instructions are generated for both 3.8 (yesterday) and trunk (from today).
So, this really looks like a bug.
Best,
Frank
On 06/29/2016 03:48 PM, Hal Finkel wrote:
> Hi Frank,
>
> I recommend trying trunk LLVM. AVX-512 development has been very active recently.
>
> -Hal
>
> ----- Original
2020 Aug 31
2
Vectorization of math function failed?
Hi,
After reading https://llvm.org/docs/Vectorizers.html#vectorization-of-function-calls
I decided to write the following C++ program:
#include <cmath>
using v4f32 = float __attribute__((__vector_size__(16)));
v4f32 fct1(v4f32 x)
{
v4f32 y;
y[0] = std::sin(x[0]);
y[1] = std::sin(x[1]);
y[2] = std::sin(x[2]);
y[3] = std::sin(x[3]);
return y;
}
v4f32 fct2(v4f32 x)
{
v4f32 y;
2009 Jan 26
0
[LLVMdev] Can TargetInstrInfo::storeRegToStackSlot use temp/virtual regs?
On Jan 23, 2009, at 3:28 AM, Mondada Gabriele wrote:
> Hi,
> I'm implementing storeRegToStackSlot() and, in order to store some
> specific registers (floating point regs and address regs) I've to
> copy them to more standard regs and copy these last ones to the slot.
> I tried to generate instructions that use physical registers, but by
> doing that I overwrote
2017 Feb 21
3
RFC: Setting MachineInstr flags through storeRegToStackSlot
> -----Original Message-----
> From: mbraun at apple.com [mailto:mbraun at apple.com]
> Sent: Friday, February 17, 2017 3:15 PM
> To: Alex Bradbury
> Cc: llvm-dev; Adrian Prantl; Eric Christopher; Robinson, Paul
> Subject: Re: [llvm-dev] RFC: Setting MachineInstr flags through
> storeRegToStackSlot
>
> Can someone familiar with debug info comment on whether it matters
2016 Jun 29
2
avx512 JIT backend generates wrong code on <4 x float>
Hi!
When compiling the attached module with the JIT engine on an Intel KNL I
see wrong code getting emitted. I attach a complete exploit program
which shows the bug in LLVM 3.8. It loads and JIT compiles the module
and prints the assembler. I stumbled on this since the result of an
actual calculation was wrong. So, it's not only the text version of the
assembler also the machine
2019 Nov 05
2
InlineSpiller - hoists leave virtual registers without live intervals
On Mon, Nov 4, 2019 at 12:18 PM Quentin Colombet <qcolombet at apple.com>
wrote:
> Hi Alex,
>
> Thanks for reporting this.
> Wei worked on the hoisting optimization.
>
> @Wei, could you work with Alex to see what is the problem.
>
> Cheers,
> -Quentin
>
> > On Nov 3, 2019, at 5:20 AM, via llvm-dev <llvm-dev at lists.llvm.org>
> wrote:
> >
2012 Jul 27
2
[LLVMdev] X86 FMA4
Just looked up the numbers from Agner Fog for Sandy Bridge for vmovaps/etc for loading/storing from memory.
vmovaps - load takes 1 load mu op, 3 latency, with a reciprocal throughput of 0.5.
vmovaps - store takes 1 store mu op, 1 load mu op for address calculation, 3 latency, with a reciprocal throughput of 1.
He does not list vmovsd, but movsd has the same stats as vmovaps, so I feel it is a