similar to: [LLVMdev] LLVMdev Digest, Vol 93, Issue 3

Displaying 20 results from an estimated 2000 matches similar to: "[LLVMdev] LLVMdev Digest, Vol 93, Issue 3"

2012 Mar 01
3
[LLVMdev] Stack alignment on X86 AVX seems incorrect
Even if you explicitly specify –stack-alignment=16 the aligned movs are still generated. It is not an issue related to ABI. See my original mail: ./llc -mattr=+avx -stack-alignment=16 < basic.ll | grep movaps | grep ymm | grep rbp vmovaps -176(%rbp), %ymm14 vmovaps -144(%rbp), %ymm11 vmovaps -240(%rbp), %ymm13 - Elena From: Cameron McInally
2012 Mar 01
0
[LLVMdev] Stack alignment on X86 AVX seems incorrect
./llc -mattr=+avx -stack-alignment=16 < basic.ll | grep movaps | grep ymm | grep rbp vmovaps -176(%rbp), %ymm14 vmovaps -144(%rbp), %ymm11 vmovaps -240(%rbp), %ymm13 vmovaps -208(%rbp), %ymm9 vmovaps -272(%rbp), %ymm7 vmovaps -304(%rbp), %ymm0 vmovaps -112(%rbp), %ymm0 vmovaps -80(%rbp), %ymm1 vmovaps -112(%rbp), %ymm0
2012 Mar 01
3
[LLVMdev] Stack alignment on X86 AVX seems incorrect
Hi Elena, You're correct. LLVM does not align the stack to 32-bytes for AVX and unaligned moves should be used for YMM spills. I wrote some code to align the stack to 32-bytes when AVX spills are present; it does break the x86-64 ABI though. If upstream would be interested in this code, I can arrange with my employer to send a patch to the mailing list. -Cameron On Mar 1, 2012, at 4:09 PM,
2012 Mar 01
0
[LLVMdev] Stack alignment on X86 AVX seems incorrect
Cameron, Aligning the stack to 32 bytes when there are auto AVX vector variables present shouldn't necessarily break the x86-64 ABI, as long as smaller auto variables remain properly aligned. A similar approach was taken for i386 in GCC in order to support SSE vectors. Perhaps you could elaborate where the ABI was violated when your patch is applied. HTH -- Evandro Menezes
2012 Jan 10
0
[LLVMdev] Calling conventions for YMM registers on AVX
This is the wrong code: declare <16 x float> @foo(<16 x float>) define <16 x float> @test(<16 x float> %x, <16 x float> %y) nounwind { entry: %x1 = fadd <16 x float> %x, %y %call = call <16 x float> @foo(<16 x float> %x1) nounwind %y1 = fsub <16 x float> %call, %y ret <16 x float> %y1 } ./llc -mattr=+avx
2012 Mar 01
0
[LLVMdev] Stack alignment on X86 AVX seems incorrect
When stack is unaligned, LLVM should generate vmovups instead of vmovaps. - Elena -----Original Message----- From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Joerg Sonnenberger Sent: Thursday, March 01, 2012 20:31 To: llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] Stack alignment on X86 AVX seems incorrect On Thu, Mar 01, 2012 at 06:16:46PM +0000,
2012 Jul 27
3
[LLVMdev] X86 FMA4
> It appears that the stats you listed are for movaps [SSE], not vmovaps [AVX]. I would *assume* that vmovaps(m128) is closer to vmovaps(m256), since they are both AVX instructions. Although, yes, I agree that this is not clear from Agner's report. Please correct me if I am misunderstanding. You are misunderstanding [no worries, happens to everyone = )]. The timings I listed were for
2018 Apr 17
0
How to create and insert a call MachineInstr?
Hi Tim, I'm sorry to bother you again. Since I have met the problem, how to check used registers and avoid clobbering live registers, which you mentioned in the email. I am working in the function X86InstrInfo::storeRegToStackSlot, which is in lib/Target/X86/X86InstrInfo.cpp. And I have an extra problem, may I use MOV64mr and two addReg to set two registers as its arguments? I want to use
2012 Mar 01
2
[LLVMdev] Stack alignment on X86 AVX seems incorrect
On Thu, Mar 01, 2012 at 06:16:46PM +0000, Demikhovsky, Elena wrote: > vmovaps should not access stack if it is not aligned to 32 I'm not completely sure I understand your problem. Are you saying that the generated code assumes 256bit alignment, your default stack alignment is 128bit and LLVM doesn't adjust it automatically? Joerg
2008 Apr 16
0
[LLVMdev] Being able to know the jitted code-size before emitting
Comments below. On Apr 15, 2008, at 4:24 AM, Nicolas Geoffray wrote: > OK, here's a new patch that adds the infrastructure and the > implementation for X86, ARM and PPC of GetInstSize and > GetFunctionSize. Both functions are virtual functions defined in > TargetInstrInfo.h. > > For X86, I moved some commodity functions from X86CodeEmitter to > X86InstrInfo. >
2012 Jul 27
0
[LLVMdev] X86 FMA4
Hey Michael, Thanks for the legwork! It appears that the stats you listed are for movaps [SSE], not vmovaps [AVX]. I would *assume* that vmovaps(m128) is closer to vmovaps(m256), since they are both AVX instructions. Although, yes, I agree that this is not clear from Agner's report. Please correct me if I am misunderstanding. As I am sure you are aware, we cannot use SSE (movaps)
2020 Sep 01
2
Vector evolution?
Hi, Please consider the following loop: using v4f32 = float __attribute__((__vector_size__(16))); void fct6(v4f32 *x) { #pragma clang loop vectorize(enable) for (int i = 0; i < 256; ++i) x[i] = 7 * x[i]; } After compiling it with: clang++ -O3 -march=native -mtune=native \ -Rpass=loop-vectorize,slp-vectorize -Rpass-missed=loop-vectorize,slp-vectorize
2016 Jun 29
0
avx512 JIT backend generates wrong code on <4 x float>
Hi Frank, I recommend trying trunk LLVM. AVX-512 development has been very active recently. -Hal ----- Original Message ----- > From: "Frank Winter via llvm-dev" <llvm-dev at lists.llvm.org> > To: "LLVM Dev" <llvm-dev at lists.llvm.org> > Sent: Wednesday, June 29, 2016 2:41:39 PM > Subject: [llvm-dev] avx512 JIT backend generates wrong code on <4
2016 Jun 30
1
avx512 JIT backend generates wrong code on <4 x float>
Hi Hal! Thanks, but unfortunately it didn't help. The exact same assembler instructions are generated for both 3.8 (yesterday) and trunk (from today). So, this really looks like a bug. Best, Frank On 06/29/2016 03:48 PM, Hal Finkel wrote: > Hi Frank, > > I recommend trying trunk LLVM. AVX-512 development has been very active recently. > > -Hal > > ----- Original
2020 Aug 31
2
Vectorization of math function failed?
Hi, After reading https://llvm.org/docs/Vectorizers.html#vectorization-of-function-calls I decided to write the following C++ program: #include <cmath> using v4f32 = float __attribute__((__vector_size__(16))); v4f32 fct1(v4f32 x) { v4f32 y; y[0] = std::sin(x[0]); y[1] = std::sin(x[1]); y[2] = std::sin(x[2]); y[3] = std::sin(x[3]); return y; } v4f32 fct2(v4f32 x) { v4f32 y;
2009 Jan 26
0
[LLVMdev] Can TargetInstrInfo::storeRegToStackSlot use temp/virtual regs?
On Jan 23, 2009, at 3:28 AM, Mondada Gabriele wrote: > Hi, > I'm implementing storeRegToStackSlot() and, in order to store some > specific registers (floating point regs and address regs) I've to > copy them to more standard regs and copy these last ones to the slot. > I tried to generate instructions that use physical registers, but by > doing that I overwrote
2017 Feb 21
3
RFC: Setting MachineInstr flags through storeRegToStackSlot
> -----Original Message----- > From: mbraun at apple.com [mailto:mbraun at apple.com] > Sent: Friday, February 17, 2017 3:15 PM > To: Alex Bradbury > Cc: llvm-dev; Adrian Prantl; Eric Christopher; Robinson, Paul > Subject: Re: [llvm-dev] RFC: Setting MachineInstr flags through > storeRegToStackSlot > > Can someone familiar with debug info comment on whether it matters
2016 Jun 29
2
avx512 JIT backend generates wrong code on <4 x float>
Hi! When compiling the attached module with the JIT engine on an Intel KNL I see wrong code getting emitted. I attach a complete exploit program which shows the bug in LLVM 3.8. It loads and JIT compiles the module and prints the assembler. I stumbled on this since the result of an actual calculation was wrong. So, it's not only the text version of the assembler also the machine
2019 Nov 05
2
InlineSpiller - hoists leave virtual registers without live intervals
On Mon, Nov 4, 2019 at 12:18 PM Quentin Colombet <qcolombet at apple.com> wrote: > Hi Alex, > > Thanks for reporting this. > Wei worked on the hoisting optimization. > > @Wei, could you work with Alex to see what is the problem. > > Cheers, > -Quentin > > > On Nov 3, 2019, at 5:20 AM, via llvm-dev <llvm-dev at lists.llvm.org> > wrote: > >
2012 Jul 27
2
[LLVMdev] X86 FMA4
Just looked up the numbers from Agner Fog for Sandy Bridge for vmovaps/etc for loading/storing from memory. vmovaps - load takes 1 load mu op, 3 latency, with a reciprocal throughput of 0.5. vmovaps - store takes 1 store mu op, 1 load mu op for address calculation, 3 latency, with a reciprocal throughput of 1. He does not list vmovsd, but movsd has the same stats as vmovaps, so I feel it is a