thr3ads.net - similar to: "AVX512 instruction generated when JIT compiling for an avx2 architecture"

AVX512 instruction generated when JIT compiling for an avx2 architecture

2016 Jun 23

2

AVX512 instruction generated when JIT compiling for an avx2 architecture

On 06/23/2016 12:56 PM, Craig Topper wrote: > Can you check what value "getHostCPUName" returned? getHostCPUName() = skylake > > On Thu, Jun 23, 2016 at 9:53 AM, Frank Winter via llvm-dev > <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: > > With LLVM 3.8 the JIT compiler engine generates an AVX512 > instruction although I

avx512 JIT backend generates wrong code on <4 x float>

2016 Jun 29

2

avx512 JIT backend generates wrong code on <4 x float>

Hi! When compiling the attached module with the JIT engine on an Intel KNL I see wrong code getting emitted. I attach a complete exploit program which shows the bug in LLVM 3.8. It loads and JIT compiles the module and prints the assembler. I stumbled on this since the result of an actual calculation was wrong. So, it's not only the text version of the assembler also the machine

Vector trunc code generation difference between llvm-3.9 and 4.0

2017 Feb 17

2

Vector trunc code generation difference between llvm-3.9 and 4.0

Correction in the C snippet: typedef signed short v8i16_t __attribute__((ext_vector_type(8))); v8i16_t foo (v8i16_t a, int n) { return a >> n; } Best regards Saurabh On 17 February 2017 at 16:21, Saurabh Verma <saurabh.verma at movidius.com> wrote: > Hello, > > We are investigating a difference in code generation for vector splat > instructions between llvm-3.9

Vector trunc code generation difference between llvm-3.9 and 4.0

2017 Feb 18

2

Vector trunc code generation difference between llvm-3.9 and 4.0

Thanks Sanjay. Interestingly for me, disable-llvm-optmzns did not make a difference in the way the shift was handled. Does the initial IR generated for you show this difference when the option is passed? Best regards Saurabh On 17 February 2017 at 19:03, Sanjay Patel <spatel at rotateright.com> wrote: > I think this is caused by a front-end change (cc'ing clang-dev) because >

Vector trunc code generation difference between llvm-3.9 and 4.0

2017 Mar 08

2

Vector trunc code generation difference between llvm-3.9 and 4.0

The regression for the reported case should be avoided after: https://reviews.llvm.org/rL297232 https://reviews.llvm.org/rL297242 https://reviews.llvm.org/rL297280 It would still be good to understand if the clang change was intentional or if that was a side effect that can be limited. On Sat, Feb 18, 2017 at 9:11 AM, Sanjay Patel <spatel at rotateright.com> wrote: > Yes, there is an

avx512 JIT backend generates wrong code on <4 x float>

2016 Jun 29

0

avx512 JIT backend generates wrong code on <4 x float>

Hi Frank, I recommend trying trunk LLVM. AVX-512 development has been very active recently. -Hal ----- Original Message ----- > From: "Frank Winter via llvm-dev" <llvm-dev at lists.llvm.org> > To: "LLVM Dev" <llvm-dev at lists.llvm.org> > Sent: Wednesday, June 29, 2016 2:41:39 PM > Subject: [llvm-dev] avx512 JIT backend generates wrong code on <4

avx512 JIT backend generates wrong code on <4 x float>

2016 Jun 30

1

avx512 JIT backend generates wrong code on <4 x float>

Hi Hal! Thanks, but unfortunately it didn't help. The exact same assembler instructions are generated for both 3.8 (yesterday) and trunk (from today). So, this really looks like a bug. Best, Frank On 06/29/2016 03:48 PM, Hal Finkel wrote: > Hi Frank, > > I recommend trying trunk LLVM. AVX-512 development has been very active recently. > > -Hal > > ----- Original

[LLVMdev] SIMD for sdiv <2 x i64>

2015 Jul 24

2

[LLVMdev] SIMD for sdiv <2 x i64>

On 07/24/2015 03:42 AM, Benjamin Kramer wrote: >> On 24.07.2015, at 08:06, zhi chen <zchenhn at gmail.com> wrote: >> >> It seems that that it's hard to vectorize int64 in LLVM. For example, LLVM 3.4 generates very complicated code for the following IR. I am running on a Haswell processor. Is it because there is no alternative AVX/2 instructions for int64? The same thing

AVX 512 Assembly Code Generation issues

2017 Jun 21

2

AVX 512 Assembly Code Generation issues

when i generate code with 72 loop iterations. the compiler generates code with using avx512 zmm operations 4 times (16x4=64) and remaining 8 iterations are handled by routine mov operations with EAX register. wouldn't it be better if it uses ymm for remaining 8 iterations as it does when iteration count is between 8 and 15. same for xmm and so on. please correct me if i am wrong. Thank

[LLVMdev] SIMD for sdiv <2 x i64>

2015 Jul 24

0

[LLVMdev] SIMD for sdiv <2 x i64>

------------------------------------ IR ------------------------------------------------------------------ if.then.i.i.i.i.i.i: ; preds = %if.then4 %S25_D = zext <2 x i32> %splatLDS17_D.splat to <2 x i64> %umul_with_overflow.i.iS26_D = shl <2 x i64> %S25_D, <i64 3, i64 3> %extumul_with_overflow.i.iS26_D = extractelement <2 x i64>

[LLVMdev] SIMD for sdiv <2 x i64>

2015 Jul 24

1

[LLVMdev] SIMD for sdiv <2 x i64>

This snippet of IR is interesting: %sub.ptr.div.iS37_D = sdiv <2 x i64> %sub.ptr.sub.iS36_D, <i64 24, i64 24> %cmp10S38_D = icmp ugt <2 x i64> %sub.ptr.div.iS37_D, %splatInsMapS1_D.splat %zextS39_D = sext <2 x i1> %cmp10S38_D to <2 x i64> %BCS39_D = bitcast <2 x i64> %zextS39_D to i128 %mskS39_D = icmp ne i128 %BCS39_D, 0 br i1 %mskS39_D,

[PATCH v2 00/27] x86: PIE support and option to extend KASLR randomization

2018 Mar 13

32

[PATCH v2 00/27] x86: PIE support and option to extend KASLR randomization

Changes: - patch v2: - Adapt patch to work post KPTI and compiler changes - Redo all performance testing with latest configs and compilers - Simplify mov macro on PIE (MOVABS now) - Reduce GOT footprint - patch v1: - Simplify ftrace implementation. - Use gcc mstack-protector-guard-reg=%gs with PIE when possible. - rfc v3: - Use --emit-relocs instead of -pie to reduce

[PATCH v2 00/27] x86: PIE support and option to extend KASLR randomization

2018 Mar 13

32

[PATCH v2 00/27] x86: PIE support and option to extend KASLR randomization

Changes: - patch v2: - Adapt patch to work post KPTI and compiler changes - Redo all performance testing with latest configs and compilers - Simplify mov macro on PIE (MOVABS now) - Reduce GOT footprint - patch v1: - Simplify ftrace implementation. - Use gcc mstack-protector-guard-reg=%gs with PIE when possible. - rfc v3: - Use --emit-relocs instead of -pie to reduce

[PATCH v1 01/27] x86/crypto: Adapt assembly for PIE support

2017 Oct 11

1

[PATCH v1 01/27] x86/crypto: Adapt assembly for PIE support

Change the assembly code to use only relative references of symbols for the kernel to be PIE compatible. Position Independent Executable (PIE) support will allow to extended the KASLR randomization range below the -2G memory limit. Signed-off-by: Thomas Garnier <thgarnie at google.com> --- arch/x86/crypto/aes-x86_64-asm_64.S | 45 ++++++++----- arch/x86/crypto/aesni-intel_asm.S

x86: PIE support and option to extend KASLR randomization

2017 Oct 04

28

x86: PIE support and option to extend KASLR randomization

These patches make the changes necessary to build the kernel as Position Independent Executable (PIE) on x86_64. A PIE kernel can be relocated below the top 2G of the virtual address space. It allows to optionally extend the KASLR randomization range from 1G to 3G. Thanks a lot to Ard Biesheuvel & Kees Cook on their feedback on compiler changes, PIE support and KASLR in general. Thanks to

x86: PIE support and option to extend KASLR randomization

2017 Oct 04

28

x86: PIE support and option to extend KASLR randomization

These patches make the changes necessary to build the kernel as Position Independent Executable (PIE) on x86_64. A PIE kernel can be relocated below the top 2G of the virtual address space. It allows to optionally extend the KASLR randomization range from 1G to 3G. Thanks a lot to Ard Biesheuvel & Kees Cook on their feedback on compiler changes, PIE support and KASLR in general. Thanks to

[PATCH v3 00/27] x86: PIE support and option to extend KASLR randomization

2018 May 23

33

[PATCH v3 00/27] x86: PIE support and option to extend KASLR randomization

Changes: - patch v3: - Update on message to describe longer term PIE goal. - Minor change on ftrace if condition. - Changed code using xchgq. - patch v2: - Adapt patch to work post KPTI and compiler changes - Redo all performance testing with latest configs and compilers - Simplify mov macro on PIE (MOVABS now) - Reduce GOT footprint - patch v1: - Simplify ftrace

[LLVMdev] i1* function argument on x86-64

2015 Jul 27

3

[LLVMdev] i1* function argument on x86-64

I am running into a problem with 'i1*' as a function's argument which seems to have appeared since I switched to LLVM 3.6 (but can have other source, of course). If I look at the assembler that the MCJIT generates for an x86-64 target I see that the array 'i1*' is taken as a sequence of 1 bit wide elements. (I guess that's correct). However, I used to call the function

[PATCH v1 00/27] x86: PIE support and option to extend KASLR randomization

2017 Oct 11

32

[PATCH v1 00/27] x86: PIE support and option to extend KASLR randomization

Changes: - patch v1: - Simplify ftrace implementation. - Use gcc mstack-protector-guard-reg=%gs with PIE when possible. - rfc v3: - Use --emit-relocs instead of -pie to reduce dynamic relocation space on mapped memory. It also simplifies the relocation process. - Move the start the module section next to the kernel. Remove the need for -mcmodel=large on modules. Extends

[PATCH v1 00/27] x86: PIE support and option to extend KASLR randomization

2017 Oct 11

32

[PATCH v1 00/27] x86: PIE support and option to extend KASLR randomization

Changes: - patch v1: - Simplify ftrace implementation. - Use gcc mstack-protector-guard-reg=%gs with PIE when possible. - rfc v3: - Use --emit-relocs instead of -pie to reduce dynamic relocation space on mapped memory. It also simplifies the relocation process. - Move the start the module section next to the kernel. Remove the need for -mcmodel=large on modules. Extends

similar to: AVX512 instruction generated when JIT compiling for an avx2 architecture