Displaying 20 results from an estimated 41 matches for "xmm15".
Did you mean:
xmm1
2017 Oct 11
1
[PATCH v1 01/27] x86/crypto: Adapt assembly for PIE support
...RY(camellia_ctr_16way)
subq $(16 * 16), %rsp;
movq %rsp, %rax;
- vmovdqa .Lbswap128_mask, %xmm14;
+ vmovdqa .Lbswap128_mask(%rip), %xmm14;
/* load IV and byteswap */
vmovdqu (%rcx), %xmm0;
@@ -1065,7 +1065,7 @@ ENTRY(camellia_ctr_16way)
/* inpack16_pre: */
vmovq (key_table)(CTX), %xmm15;
- vpshufb .Lpack_bswap, %xmm15, %xmm15;
+ vpshufb .Lpack_bswap(%rip), %xmm15, %xmm15;
vpxor %xmm0, %xmm15, %xmm0;
vpxor %xmm1, %xmm15, %xmm1;
vpxor %xmm2, %xmm15, %xmm2;
@@ -1133,7 +1133,7 @@ camellia_xts_crypt_16way:
subq $(16 * 16), %rsp;
movq %rsp, %rax;
- vmovdqa .Lxts_gf128mul_and...
2015 Jun 26
2
[LLVMdev] Can LLVM vectorize <2 x i32> type
...ltq
vmovq %rax, %xmm4
vmovq %xmm2, %rax
cltq
vmovq %rax, %xmm5
vpunpcklqdq %xmm4, %xmm5, %xmm4 # xmm4 = xmm5[0],xmm4[0]
vpcmpgtq %xmm3, %xmm4, %xmm3
vptest %xmm3, %xmm3
je .LBB10_66
# BB#5: # %for.body.preheader
vpaddq %xmm15, %xmm2, %xmm3
vpand %xmm15, %xmm3, %xmm3
vpaddq .LCPI10_1(%rip), %xmm3, %xmm8
vpand .LCPI10_5(%rip), %xmm8, %xmm5
vpxor %xmm4, %xmm4, %xmm4
vpcmpeqq %xmm4, %xmm5, %xmm6
vptest %xmm6, %xmm6
jne .LBB10_9
It turned out that the vector one is way more complicated...
2012 Jan 09
3
[LLVMdev] Calling conventions for YMM registers on AVX
...ymm registers are call-clobbered on non-Windows platforms.
This thread has lots of interesting information: http://software.intel.com/en-us/forums/showthread.php?t=59291
I wasn't able to find a formal Win64 ABI spec, but according to http://www.agner.org/optimize/calling_conventions.pdf, xmm6-xmm15 are callee-saved on win64, but the high bits in ymm6-ymm15 are not.
That's not currently correctly modelled in LLVM. To fix it, create a pseudo-register YMMHI_CLOBBER that aliases ymm6-ymm15. Then add YMMHI_CLOBBER to the registers clobbered by WINCALL64*.
/jakob
2010 Oct 20
2
[LLVMdev] llvm register reload/spilling around calls
...nstructions are all prefixed
> by:
>
> let Defs = [RAX, RCX, RDX, RSI, RDI, R8, R9, R10, R11, FP0, FP1, FP2,
> FP3, FP4, FP5, FP6, ST0, ST1, MM0, MM1, MM2, MM3, MM4, MM5, MM6, MM7,
> XMM0, XMM1, XMM2, XMM3, XMM4, XMM5, XMM6, XMM7, XMM8, XMM9, XMM10,
> XMM11, XMM12, XMM13, XMM14, XMM15, EFLAGS],
>
> This is the fixed list of call-clobbered registers. It should really
> be controlled by the calling convention of the called function
> instead.
>
> The WINCALL* instructions only exist because of this.
Ahh I see now. I hacked this up and indeed the code looks much...
2010 Sep 29
3
[LLVMdev] spilling & xmm register usage
...being generated, the
source IR on both systems is the same.
Try compiling the attached module:
llc -O3 -filetype=asm -o BAD.s BAD.ll
Under linux, the resulting assembly file shows that only registers up to
xmm5, while the same command under windows generates assembly that uses
all registers up to xmm15 (on the same 64bit Intel Q9550).
At the same time, the linux-assembly shows lots and lots of spills and
reloads.
Although I did not check whether the code generated by the JIT is the
same or comparable, the fact that this occurs with the static llc seems
to prove that there is a major problem here...
2010 Sep 29
0
[LLVMdev] spilling & xmm register usage
...oth systems is the same.
> Try compiling the attached module:
>
> llc -O3 -filetype=asm -o BAD.s BAD.ll
>
> Under linux, the resulting assembly file shows that only registers up to
> xmm5, while the same command under windows generates assembly that uses
> all registers up to xmm15 (on the same 64bit Intel Q9550).
> At the same time, the linux-assembly shows lots and lots of spills and
> reloads.
The Win64 calling convention defines XMM6..XMM15 as callee saved, so their values can remain live across the calls. On Linux all XMM registers are call-clobbered so any live...
2010 Oct 20
0
[LLVMdev] llvm register reload/spilling around calls
...fixed
>> by:
>>
>> let Defs = [RAX, RCX, RDX, RSI, RDI, R8, R9, R10, R11, FP0, FP1, FP2,
>> FP3, FP4, FP5, FP6, ST0, ST1, MM0, MM1, MM2, MM3, MM4, MM5, MM6, MM7,
>> XMM0, XMM1, XMM2, XMM3, XMM4, XMM5, XMM6, XMM7, XMM8, XMM9, XMM10,
>> XMM11, XMM12, XMM13, XMM14, XMM15, EFLAGS],
>>
>> This is the fixed list of call-clobbered registers. It should really
>> be controlled by the calling convention of the called function
>> instead.
>>
>> The WINCALL* instructions only exist because of this.
> Ahh I see now. I hacked this up a...
2010 Oct 20
1
[LLVMdev] llvm register reload/spilling around calls
...>>>
>>> let Defs = [RAX, RCX, RDX, RSI, RDI, R8, R9, R10, R11, FP0, FP1, FP2,
>>> FP3, FP4, FP5, FP6, ST0, ST1, MM0, MM1, MM2, MM3, MM4, MM5, MM6, MM7,
>>> XMM0, XMM1, XMM2, XMM3, XMM4, XMM5, XMM6, XMM7, XMM8, XMM9, XMM10,
>>> XMM11, XMM12, XMM13, XMM14, XMM15, EFLAGS],
>>>
>>> This is the fixed list of call-clobbered registers. It should really
>>> be controlled by the calling convention of the called function
>>> instead.
>>>
>>> The WINCALL* instructions only exist because of this.
>> Ahh I s...
2016 Nov 23
4
RFC: code size reduction in X86 by replacing EVEX with VEX encoding
...[VEX] OPCODE ModR/M [SIB] [DISP] [IMM]
# of bytes: 0,2,3 1 1 0,1 0,1,2,4 0,1
Note that the EVEX prefix requires 4 bytes whereas the VEX prefix can take only up to 3 bytes.
Consequently, for the SKX architecture, many instructions that use only the lower registers of XMM0-XMM15 or YMM0-YMM15, can be encoded by either the EVEX or the VEX format. For such cases, using the VEX encoding results in a code size reduction of ~2 bytes even though it is compiled with the AVX512F/AVX512VL features enabled.
For example: "vmovss %xmm0, 32(%rsp,%rax,4)", has the following...
2015 Jun 24
2
[LLVMdev] Can LLVM vectorize <2 x i32> type
Hi,
Is LLVM be able to generate code for the following code?
%mul = mul <2 x i32> %1, %2, where %1 and %2 are <2 x i32> type.
I am running it on a Haswell processor with LLVM-3.4.2. It seems that it
will generates really complicated code with vpaddq, vpmuludq, vpsllq,
vpsrlq.
Thanks,
Zhi
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
2010 Oct 20
0
[LLVMdev] llvm register reload/spilling around calls
...by:
let Defs = [RAX, RCX, RDX, RSI, RDI, R8, R9, R10, R11,
FP0, FP1, FP2, FP3, FP4, FP5, FP6, ST0, ST1,
MM0, MM1, MM2, MM3, MM4, MM5, MM6, MM7,
XMM0, XMM1, XMM2, XMM3, XMM4, XMM5, XMM6, XMM7,
XMM8, XMM9, XMM10, XMM11, XMM12, XMM13, XMM14, XMM15, EFLAGS],
This is the fixed list of call-clobbered registers. It should really be controlled by the calling convention of the called function instead.
The WINCALL* instructions only exist because of this.
One problem is that calling conventions are handled while building the selection DAG, and t...
2017 Jul 28
3
Purpose of various register classes in X86 target
Hello Matthias,
On 28 July 2017 at 04:13, Matthias Braun <mbraun at apple.com> wrote:
> It's not that hard in principle:
> - A register class is a set of registers.
> - Virtual Registers have a register class assigned.
> - If you have register constraints (like x86 8bit operations only work on
> al,ah,etc.) then you have to create a new register class to express that.
2010 Oct 20
3
[LLVMdev] llvm register reload/spilling around calls
Thanks for giving it a look!
On 19.10.2010 23:21, Jakob Stoklund Olesen wrote:
> On Oct 19, 2010, at 11:40 AM, Roland Scheidegger wrote:
>
>> So I saw that the code is doing lots of register
>> spilling/reloading. Now I understand that due to calling
>> conventions, there's not really a way to avoid this - I tried using
>> coldcc but apparently the backend
2012 Jan 10
0
[LLVMdev] Calling conventions for YMM registers on AVX
...">, DwarfRegNum<[28, -2, -2]>;
def XMM12b: Register<"xmm12b">, DwarfRegNum<[29, -2, -2]>;
def XMM13b: Register<"xmm13b">, DwarfRegNum<[30, -2, -2]>;
def XMM14b: Register<"xmm14b">, DwarfRegNum<[31, -2, -2]>;
def XMM15b: Register<"xmm15b">, DwarfRegNum<[32, -2, -2]>;
// YMM Registers, used by AVX instructions
let SubRegIndices = [sub_xmm, sub_xmmb] in {
def YMM0: RegisterWithSubRegs<"ymm0", [XMM0, XMM0b]>, DwarfRegNum<[17, 21, 21]>;
def YMM1: RegisterWithSubReg...
2008 Sep 03
2
[LLVMdev] Codegen/Register allocation question.
...M4<imp-def,dead>,
%XMM5<imp-def,dead>, %XMM6<imp-def,dead>, %XMM7<imp-def,dead>,
%XMM8<imp-def,dead>, %XMM9<imp-def,dead>, %XMM10<imp-def,dead>,
%XMM11<imp-def,dead>, %XMM12<imp-def,dead>, %XMM13<imp-def,dead>,
%XMM14<imp-def,dead>, %XMM15<imp-def,dead>, %EFLAGS<imp-def,dead>,
%EAX<imp-def>, %ECX<imp-def,dead>, %EDI<imp-def,dead>,
%EDX<imp-def,dead>, %ESI<imp-def,dead>
108 ADJCALLSTACKUP 0, 0, %ESP<imp-def>, %EFLAGS<imp-def,dead>, %ESP<imp-use>
116 %reg1029<def,de...
2018 Mar 13
32
[PATCH v2 00/27] x86: PIE support and option to extend KASLR randomization
Changes:
- patch v2:
- Adapt patch to work post KPTI and compiler changes
- Redo all performance testing with latest configs and compilers
- Simplify mov macro on PIE (MOVABS now)
- Reduce GOT footprint
- patch v1:
- Simplify ftrace implementation.
- Use gcc mstack-protector-guard-reg=%gs with PIE when possible.
- rfc v3:
- Use --emit-relocs instead of -pie to reduce
2018 Mar 13
32
[PATCH v2 00/27] x86: PIE support and option to extend KASLR randomization
Changes:
- patch v2:
- Adapt patch to work post KPTI and compiler changes
- Redo all performance testing with latest configs and compilers
- Simplify mov macro on PIE (MOVABS now)
- Reduce GOT footprint
- patch v1:
- Simplify ftrace implementation.
- Use gcc mstack-protector-guard-reg=%gs with PIE when possible.
- rfc v3:
- Use --emit-relocs instead of -pie to reduce
2017 Oct 04
28
x86: PIE support and option to extend KASLR randomization
These patches make the changes necessary to build the kernel as Position
Independent Executable (PIE) on x86_64. A PIE kernel can be relocated below
the top 2G of the virtual address space. It allows to optionally extend the
KASLR randomization range from 1G to 3G.
Thanks a lot to Ard Biesheuvel & Kees Cook on their feedback on compiler
changes, PIE support and KASLR in general. Thanks to
2017 Oct 04
28
x86: PIE support and option to extend KASLR randomization
These patches make the changes necessary to build the kernel as Position
Independent Executable (PIE) on x86_64. A PIE kernel can be relocated below
the top 2G of the virtual address space. It allows to optionally extend the
KASLR randomization range from 1G to 3G.
Thanks a lot to Ard Biesheuvel & Kees Cook on their feedback on compiler
changes, PIE support and KASLR in general. Thanks to
2018 May 23
33
[PATCH v3 00/27] x86: PIE support and option to extend KASLR randomization
Changes:
- patch v3:
- Update on message to describe longer term PIE goal.
- Minor change on ftrace if condition.
- Changed code using xchgq.
- patch v2:
- Adapt patch to work post KPTI and compiler changes
- Redo all performance testing with latest configs and compilers
- Simplify mov macro on PIE (MOVABS now)
- Reduce GOT footprint
- patch v1:
- Simplify ftrace