thr3ads.net - search: "vpxors"

[PATCH v1 01/27] x86/crypto: Adapt assembly for PIE support

2017 Oct 11

1

[PATCH v1 01/27] x86/crypto: Adapt assembly for PIE support

Change the assembly code to use only relative references of symbols for the kernel to be PIE compatible. Position Independent Executable (PIE) support will allow to extended the KASLR randomization range below the -2G memory limit. Signed-off-by: Thomas Garnier <thgarnie at google.com> --- arch/x86/crypto/aes-x86_64-asm_64.S | 45 ++++++++----- arch/x86/crypto/aesni-intel_asm.S

[LoopVectorizer] Improving the performance of dot product reduction loop

2018 Jul 24

4

[LoopVectorizer] Improving the performance of dot product reduction loop

...ting a 256-bit vpmaddwd concated with a > 256-bit zero vector going into a 512-bit vaddd before type legalization. > The 512-bit concat and vpaddd get split during type legalization, and the > high half of the add gets constant folded away. I'm guessing we probably > finished with 4 vpxors before the loop but MachineCSE(or some other pass?) > combined two of them when it figured out the loop didn't modify them. > > >> >> Thanks again, >> Hal >> >> >> Thanks, >> ~Craig >> >> >> -- >> Hal Finkel >> Lea...

[LoopVectorizer] Improving the performance of dot product reduction loop

2018 Jul 23

4

[LoopVectorizer] Improving the performance of dot product reduction loop

...backend into generating a 256-bit vpmaddwd concated with a 256-bit zero vector going into a 512-bit vaddd before type legalization. The 512-bit concat and vpaddd get split during type legalization, and the high half of the add gets constant folded away. I'm guessing we probably finished with 4 vpxors before the loop but MachineCSE(or some other pass?) combined two of them when it figured out the loop didn't modify them. > > Thanks again, > Hal > > > Thanks, > ~Craig > > > -- > Hal Finkel > Lead, Compiler Technology and Programming Languages > Leaders...

[PATCH v2 00/27] x86: PIE support and option to extend KASLR randomization

2018 Mar 13

32

[PATCH v2 00/27] x86: PIE support and option to extend KASLR randomization

Changes: - patch v2: - Adapt patch to work post KPTI and compiler changes - Redo all performance testing with latest configs and compilers - Simplify mov macro on PIE (MOVABS now) - Reduce GOT footprint - patch v1: - Simplify ftrace implementation. - Use gcc mstack-protector-guard-reg=%gs with PIE when possible. - rfc v3: - Use --emit-relocs instead of -pie to reduce

[PATCH v2 00/27] x86: PIE support and option to extend KASLR randomization

2018 Mar 13

32

[PATCH v2 00/27] x86: PIE support and option to extend KASLR randomization

Changes: - patch v2: - Adapt patch to work post KPTI and compiler changes - Redo all performance testing with latest configs and compilers - Simplify mov macro on PIE (MOVABS now) - Reduce GOT footprint - patch v1: - Simplify ftrace implementation. - Use gcc mstack-protector-guard-reg=%gs with PIE when possible. - rfc v3: - Use --emit-relocs instead of -pie to reduce

x86: PIE support and option to extend KASLR randomization

2017 Oct 04

28

x86: PIE support and option to extend KASLR randomization

These patches make the changes necessary to build the kernel as Position Independent Executable (PIE) on x86_64. A PIE kernel can be relocated below the top 2G of the virtual address space. It allows to optionally extend the KASLR randomization range from 1G to 3G. Thanks a lot to Ard Biesheuvel & Kees Cook on their feedback on compiler changes, PIE support and KASLR in general. Thanks to

x86: PIE support and option to extend KASLR randomization

2017 Oct 04

28

x86: PIE support and option to extend KASLR randomization

These patches make the changes necessary to build the kernel as Position Independent Executable (PIE) on x86_64. A PIE kernel can be relocated below the top 2G of the virtual address space. It allows to optionally extend the KASLR randomization range from 1G to 3G. Thanks a lot to Ard Biesheuvel & Kees Cook on their feedback on compiler changes, PIE support and KASLR in general. Thanks to

[PATCH v3 00/27] x86: PIE support and option to extend KASLR randomization

2018 May 23

33

[PATCH v3 00/27] x86: PIE support and option to extend KASLR randomization

Changes: - patch v3: - Update on message to describe longer term PIE goal. - Minor change on ftrace if condition. - Changed code using xchgq. - patch v2: - Adapt patch to work post KPTI and compiler changes - Redo all performance testing with latest configs and compilers - Simplify mov macro on PIE (MOVABS now) - Reduce GOT footprint - patch v1: - Simplify ftrace

[PATCH v1 00/27] x86: PIE support and option to extend KASLR randomization

2017 Oct 11

32

[PATCH v1 00/27] x86: PIE support and option to extend KASLR randomization

Changes: - patch v1: - Simplify ftrace implementation. - Use gcc mstack-protector-guard-reg=%gs with PIE when possible. - rfc v3: - Use --emit-relocs instead of -pie to reduce dynamic relocation space on mapped memory. It also simplifies the relocation process. - Move the start the module section next to the kernel. Remove the need for -mcmodel=large on modules. Extends

[PATCH v1 00/27] x86: PIE support and option to extend KASLR randomization

2017 Oct 11

32

[PATCH v1 00/27] x86: PIE support and option to extend KASLR randomization

Changes: - patch v1: - Simplify ftrace implementation. - Use gcc mstack-protector-guard-reg=%gs with PIE when possible. - rfc v3: - Use --emit-relocs instead of -pie to reduce dynamic relocation space on mapped memory. It also simplifies the relocation process. - Move the start the module section next to the kernel. Remove the need for -mcmodel=large on modules. Extends

[LLVMdev] Can LLVM vectorize <2 x i32> type

2015 Jun 26

2

[LLVMdev] Can LLVM vectorize <2 x i32> type

For example, I have the following IR code, for.cond.preheader: ; preds = %if.end18 %mul = mul i32 %12, %3 %cmp21128 = icmp sgt i32 %mul, 0 br i1 %cmp21128, label %for.body.preheader, label %return for.body.preheader: ; preds = %for.cond.preheader %19 = mul i32 %12, %3 %20 = add i32 %19, -1 %21 = zext i32 %20 to i64 %22 =

[LLVMdev] SIMD for sdiv <2 x i64>

2015 Jul 24

0

[LLVMdev] SIMD for sdiv <2 x i64>

------------------------------------ IR ------------------------------------------------------------------ if.then.i.i.i.i.i.i: ; preds = %if.then4 %S25_D = zext <2 x i32> %splatLDS17_D.splat to <2 x i64> %umul_with_overflow.i.iS26_D = shl <2 x i64> %S25_D, <i64 3, i64 3> %extumul_with_overflow.i.iS26_D = extractelement <2 x i64>

[LLVMdev] SIMD for sdiv <2 x i64>

2015 Jul 24

1

[LLVMdev] SIMD for sdiv <2 x i64>

This snippet of IR is interesting: %sub.ptr.div.iS37_D = sdiv <2 x i64> %sub.ptr.sub.iS36_D, <i64 24, i64 24> %cmp10S38_D = icmp ugt <2 x i64> %sub.ptr.div.iS37_D, %splatInsMapS1_D.splat %zextS39_D = sext <2 x i1> %cmp10S38_D to <2 x i64> %BCS39_D = bitcast <2 x i64> %zextS39_D to i128 %mskS39_D = icmp ne i128 %BCS39_D, 0 br i1 %mskS39_D,

[LLVMdev] SIMD for sdiv <2 x i64>

2015 Jul 24

2

[LLVMdev] SIMD for sdiv <2 x i64>

On 07/24/2015 03:42 AM, Benjamin Kramer wrote: >> On 24.07.2015, at 08:06, zhi chen <zchenhn at gmail.com> wrote: >> >> It seems that that it's hard to vectorize int64 in LLVM. For example, LLVM 3.4 generates very complicated code for the following IR. I am running on a Haswell processor. Is it because there is no alternative AVX/2 instructions for int64? The same thing

[LoopVectorizer] Improving the performance of dot product reduction loop

2018 Jul 23

3

[LoopVectorizer] Improving the performance of dot product reduction loop

Hello all, This code https://godbolt.org/g/tTyxpf is a dot product reduction loop multipying sign extended 16-bit values to produce a 32-bit accumulated result. The x86 backend is currently not able to optimize it as well as gcc and icc. The IR we are getting from the loop vectorizer has several v8i32 adds and muls inside the loop. These are fed by v8i16 loads and sexts from v8i16 to v8i32. The

[LLVMdev] Can LLVM vectorize <2 x i32> type

2015 Jun 24

2

[LLVMdev] Can LLVM vectorize <2 x i32> type

Hi, Is LLVM be able to generate code for the following code? %mul = mul <2 x i32> %1, %2, where %1 and %2 are <2 x i32> type. I am running it on a Haswell processor with LLVM-3.4.2. It seems that it will generates really complicated code with vpaddq, vpmuludq, vpsllq, vpsrlq. Thanks, Zhi -------------- next part -------------- An HTML attachment was scrubbed... URL:

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 10

13

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

On Tue, Sep 9, 2014 at 11:39 PM, Chandler Carruth <chandlerc at google.com> wrote: > Awesome, thanks for all the information! > > See below: > > On Tue, Sep 9, 2014 at 6:13 AM, Andrea Di Biagio <andrea.dibiagio at gmail.com> > wrote: >> >> You have already mentioned how the new shuffle lowering is missing >> some features; for example, you explicitly

[LLVMdev] [llvm-commits] r192750 - Enable MI Sched for x86.

2013 Oct 15

0

[LLVMdev] [llvm-commits] r192750 - Enable MI Sched for x86.

I should mention a couple of useful self-explanatory LLVM flags for triage: -enable-misched=false -verify-misched -Andy On Oct 15, 2013, at 4:43 PM, Eric Christopher <echristo at gmail.com> wrote: > Grats on the work, a long time coming! > > Beware the incoming register allocation bugs ;) > > -eric > > On Tue, Oct 15, 2013 at 4:33 PM, Andrew Trick <atrick at

search for: vpxors