thr3ads.net - similar to: "[LLVMdev] ARM assembler bug on LLVM 3.5"

Displaying 20 results from an estimated 6000 matches similar to: "[LLVMdev] ARM assembler bug on LLVM 3.5"

2014 Sep 22

[LLVMdev] ARM assembler bug on LLVM 3.5

On Sun, 21 Sep 2014, Renato Golin wrote: > On 20 September 2014 15:19, Mikulas Patocka > <mikulas at artax.karlin.mff.cuni.cz> wrote: > > The problem is this - you either compile this program with > > -mcpu=cortex-a9, then clang reports error on the sdiv instruction because > > cortex a9 doesn't have sdiv. Or - you compile the program with > >

[LLVMdev] LLVM ARM VMLA instruction

2013 Dec 19

[LLVMdev] LLVM ARM VMLA instruction

Hi all, Thanks for the info. Few observations from my side : LLVM : cortex-a8 vfpv3 : no vmla or vfma instruction emitted cortex-a8 vfpv4 : no vmla or vfma instruction emitted (This is invalid though as cortex-a8 does not have vfpv4) cortex-a8 vfpv4 with ffp-contract=fast : vfma instruction emitted ( this seems a bug to me!! If cortex-a8 doesn't come with vfpv4 then vfma instructions

[LLVMdev] LLVM ARM VMLA instruction

2013 Dec 18

[LLVMdev] LLVM ARM VMLA instruction

Hi, Hi, I was going through Code of LLVM instruction code generation for ARM. I came across VMLA instruction hazards (Floating point multiply and accumulate). I was comparing assembly code emitted by LLVM and GCC, where i saw that GCC was happily using VMLA instruction for floating point while LLVM never used it, instead it used a pair of VMUL and VADD instruction. I wanted to know if there is

[LLVMdev] LLVM ARM VMLA instruction

2013 Dec 19

[LLVMdev] LLVM ARM VMLA instruction

Just to clarify: gcc 4.8.1 generates that fma at -O2; no FP relaxation or other flags specified. On Wed, Dec 18, 2013 at 6:02 PM, Kay Tiong Khoo <kkhoo at perfwizard.com>wrote: > Thanks for the explanation, Tim! > > gcc 4.8.1 *does* generate an fma for your code example for an x86 target > that supports fma. I'd bet that the HW vendors' compilers do the same, but >

[LLVMdev] LLVM ARM VMLA instruction

2013 Dec 19

[LLVMdev] LLVM ARM VMLA instruction

Thanks for the explanation, Tim! gcc 4.8.1 *does* generate an fma for your code example for an x86 target that supports fma. I'd bet that the HW vendors' compilers do the same, but I don't have any of those installed at the moment to test that theory. So this is a bug in those compilers? Do you know how they justify it? I see section 6.5 "Expressions" in the C standard, and

[LLVMdev] fmac generation for cortex-a9

2012 Nov 09

[LLVMdev] fmac generation for cortex-a9

Hi Renato, It's definitively not A15. Can this be the case that NEON units for cortex-A9 support it but isn't documented/recommended ? And as mentioned before code is working ! Seb > -----Original Message----- > From: rengolin at gmail.com [mailto:rengolin at gmail.com] On Behalf Of > Renato Golin > Sent: Friday, November 09, 2012 6:27 PM > To: Sebastien DELDON-GNB >

[LLVMdev] fmac generation for cortex-a9

2012 Nov 09

[LLVMdev] fmac generation for cortex-a9

Hi Bastien, Weird gcc is generating fma for my platform STEricsson Novathor with Linaro, code works. It also works when I use LLVM to generate fma (using llc -mtriple=armv7-eabi). Maybe someone from ARM can answer the question ? Seb From: JF Bastien [mailto:jfb at google.com] Sent: Friday, November 09, 2012 5:36 PM To: Sebastien DELDON-GNB Cc: Anitha Boyapati; llvmdev at cs.uiuc.edu Subject:

[LLVMdev] [PATCH] Do not generate nopl instruction on CPUs that don't support it.

2013 Nov 07

[LLVMdev] [PATCH] Do not generate nopl instruction on CPUs that don't support it.

On Tue, 5 Nov 2013, Rafael Espíndola wrote: > Please include a testcase with the patch. I'm sending testcase here. Compile it with "clang -O2 -march=k6-2 -c loop.c" > gas uses " nopl 0x0(%eax)" for k6_2. Are you sure it is a gas bug? Yes, it is gas bug. I should report it to binutils maintainers. Mikulas > On 3 November 2013 13:50, Mikulas Patocka >

[LLVMdev] LLVM ARM VMLA instruction

2013 Dec 18

[LLVMdev] LLVM ARM VMLA instruction

> I was going through Code of LLVM instruction code generation for ARM. I came > across VMLA instruction hazards (Floating point multiply and accumulate). I > was comparing assembly code emitted by LLVM and GCC, where i saw that GCC > was happily using VMLA instruction for floating point while LLVM never used > it, instead it used a pair of VMUL and VADD instruction. It looks like

[LLVMdev] fmac generation for cortex-a9

2012 Nov 09

[LLVMdev] fmac generation for cortex-a9

Hi Sebastien, ARMv7-M has VFMA and LLVM's "triple" is far from perfect. Wikipedia tells me NovaThor can also be A15, or STE could have cramped a VFPv4 in it? ;) Or possibly, your code never branches into the VFMA. Many things could be happening, but usually, VFMA shouldn't be generated for A9. A GCC bug, maybe? On 9 November 2012 16:51, Sebastien DELDON-GNB

[LLVMdev] LLVM ARM VMLA instruction

2013 Dec 19

[LLVMdev] LLVM ARM VMLA instruction

> cortex-a8 vfpv4 with ffp-contract=fast : vfma instruction emitted ( this > seems a bug to me!! If cortex-a8 doesn't come with vfpv4 then vfma > instructions generated will be invalid ) If I'm understanding correctly, you've specifically told it this Cortex-A8 *does* come with vfpv4. Those kinds of odd combinations can be useful sometimes (if only for tests), so I'm not

[LLVMdev] fmac generation for cortex-a9

2012 Nov 09

[LLVMdev] fmac generation for cortex-a9

cat /proc/cpuinfo ? Are you sure it's generating VFMA and not VMLA? On Fri, Nov 9, 2012 at 9:35 AM, Sebastien DELDON-GNB < sebastien.deldon at st.com> wrote: > Hi Renato, > > It's definitively not A15. Can this be the case that NEON units for > cortex-A9 support it but isn't documented/recommended ? > And as mentioned before code is working ! > > Seb >

[LLVMdev] [PATCH] Do not generate nopl instruction on CPUs that don't support it.

2013 Nov 03

[LLVMdev] [PATCH] Do not generate nopl instruction on CPUs that don't support it.

Hi This patch fixes code generation bug - 586-class CPUs don't support the nopl instruction and some 686-class CPUs don't support it too. I created bug 17792 for that. BTW. I think you should also optimize padding on these CPUs - instead of a stream of 0x90 nops, you should generate variants of "lea (%esi), %esi" instruction like gcc. This patch disables generation of

[LLVMdev] LLVM ARM VMLA instruction

2013 Dec 18

[LLVMdev] LLVM ARM VMLA instruction

On 18 December 2013 09:42, Tim Northover <t.p.northover at gmail.com> wrote: > That means one strictly newer than > cortex-a8: cortex-a7 (don't ask), cortex-a9, cortex-a12, cortex-a15 or > krait I believe. > Hi Tim, Cortex A8 and A9 use VFPv3. A7, A12 and A15 use VFPv4. cheers, --renato -------------- next part -------------- An HTML attachment was scrubbed... URL:

[LLVMdev] [PATCH] Do not generate nopl instruction on CPUs that don't support it.

2013 Nov 12

[LLVMdev] [PATCH] Do not generate nopl instruction on CPUs that don't support it.

On 7 November 2013 18:31, Mikulas Patocka <mikulas at artax.karlin.mff.cuni.cz> wrote: > > > On Tue, 5 Nov 2013, Rafael Espíndola wrote: > >> Please include a testcase with the patch. > > I'm sending testcase here. Compile it with > "clang -O2 -march=k6-2 -c loop.c" The test should be in the patch itself. It can use llvm-mc to check how the nops are

[LLVMdev] [PATCH] Do not generate nopl instruction on CPUs that don't support it.

2013 Nov 05

[LLVMdev] [PATCH] Do not generate nopl instruction on CPUs that don't support it.

Please include a testcase with the patch. gas uses " nopl 0x0(%eax)" for k6_2. Are you sure it is a gas bug? On 3 November 2013 13:50, Mikulas Patocka <mikulas at artax.karlin.mff.cuni.cz> wrote: > Hi > > This patch fixes code generation bug - 586-class CPUs don't support the > nopl instruction and some 686-class CPUs don't support it too. > > I

[LLVMdev] LLVM ARM VMLA instruction

2013 Dec 19

[LLVMdev] LLVM ARM VMLA instruction

Test case name : >> llvm/projects/test-suite/SingleSource/Benchmarks/Misc/matmul_f64_4x4.c - >> This is a 4x4 matrix multiplication, we can make small changes to make it a >> 3x3 matrix multiplication for making things simple to understand . >> > > This is one very specific case. How does that behave on all other cases? > Normally, every big improvement comes with

[LLVMdev] LLVM ARM VMLA instruction

2013 Dec 19

[LLVMdev] LLVM ARM VMLA instruction

On 19 December 2013 11:16, suyog sarda <sardask01 at gmail.com> wrote: > Test case name : > llvm/projects/test-suite/SingleSource/Benchmarks/Misc/matmul_f64_4x4.c - > This is a 4x4 matrix multiplication, we can make small changes to make it a > 3x3 matrix multiplication for making things simple to understand . > This is one very specific case. How does that behave on all

[LLVMdev] LLVM ARM VMLA instruction

2013 Dec 19

[LLVMdev] LLVM ARM VMLA instruction

On Thu, Dec 19, 2013 at 4:36 PM, Renato Golin <renato.golin at linaro.org>wrote: > On 19 December 2013 08:50, suyog sarda <sardask01 at gmail.com> wrote: > >> It may seem that total number of cycles are more or less same for single >> vmla and vmul+vadd. However, when vmul+vadd combination is used instead of >> vmla, then intermediate results will be generated

[LLVMdev] thumb2 has divide instructions

2009 Dec 01

[LLVMdev] thumb2 has divide instructions

I'm working with a Cortex-M3 core which is v7-M profile, and it has udiv and sdiv. bagel Jim Grosbach wrote: > Hello, > > As I understand it, the divide instructions are only available on the > v7-R profile of the v7 architecture. Is that incorrect? > > -Jim

similar to: [LLVMdev] ARM assembler bug on LLVM 3.5