thr3ads.net - llvm dev - [llvm-dev] Unnecessary spill/fill issue [May 2016]

If this information is useful, please help other people find it:
Share via:

Jason via llvm-dev

2016-May-06 17:44 UTC

[llvm-dev] Unnecessary spill/fill issue

Hi, I am using mcjit in llvm 3.6 to jit kernels to x86 avx2. I've noticed
some inefficient use of the stack around constant vectors. In one example,
I have code that computes a series of constant vectors at compile time.
Each vector has a single use. In the final asm, I see a series of spills at
the top of the function of all the constant vectors immediately to stack,
then each use references the stack pointer directly:

Lots of these at top of function:

movabsq $.LCPI0_212, %rbx
vmovaps (%rbx), %ymm0
vmovaps %ymm0, 2816(%rsp)       # 32-byte Spill

Later on, each use references the stack pointer:

vpaddd 2816(%rsp), %ymm4, %ymm1 # 32-byte Folded Reload

It seems the spill to stack is unnecessary. In one particularly bad kernel,
I have 128 8-wide constant vectors, and so there is 4KB of stack use just
for these constants. I think a better approach could be to load the
constant vector pointers as needed:

movabsq $.LCPI0_212, %rbx
vpaddd (%rbx), %ymm4, %ymm1


Thanks,
Jason
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160506/6480b452/attachment.html>

Jason via llvm-dev

2016-May-09 21:41 UTC

head link

[llvm-dev] Unnecessary spill/fill issue

Does anyone have any insight into this problem? Is there a way to minimize
excessive spill/fill for this kind of scenario?
Thanks,
Jason


On Fri, May 6, 2016 at 10:44 AM, Jason <thesurprises at gmail.com> wrote:
> Hi, I am using mcjit in llvm 3.6 to jit kernels to x86 avx2. I've
noticed
> some inefficient use of the stack around constant vectors. In one example,
> I have code that computes a series of constant vectors at compile time.
> Each vector has a single use. In the final asm, I see a series of spills at
> the top of the function of all the constant vectors immediately to stack,
> then each use references the stack pointer directly:
>
> Lots of these at top of function:
>
> movabsq $.LCPI0_212, %rbx
> vmovaps (%rbx), %ymm0
> vmovaps %ymm0, 2816(%rsp)       # 32-byte Spill
>
> Later on, each use references the stack pointer:
>
> vpaddd 2816(%rsp), %ymm4, %ymm1 # 32-byte Folded Reload
>
> It seems the spill to stack is unnecessary. In one particularly bad
> kernel, I have 128 8-wide constant vectors, and so there is 4KB of stack
> use just for these constants. I think a better approach could be to load
> the constant vector pointers as needed:
>
> movabsq $.LCPI0_212, %rbx
> vpaddd (%rbx), %ymm4, %ymm1
>
>
> Thanks,
> Jason
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160509/84123ba9/attachment.html>

Sanjay Patel via llvm-dev

2016-May-09 22:05 UTC

head link

[llvm-dev] Unnecessary spill/fill issue

It sounds bad, but I can't tell from the posted info how to diagnose it.

Can you post (a possibly reduced) example to demonstrate what you're
seeing? A bug report would be even better, so we can track if there are
multiple problems:
https://llvm.org/bugs/

On Mon, May 9, 2016 at 3:41 PM, Jason via llvm-dev <llvm-dev at
lists.llvm.org>
wrote:
> Does anyone have any insight into this problem? Is there a way to minimize
> excessive spill/fill for this kind of scenario?
> Thanks,
> Jason
>
>
> On Fri, May 6, 2016 at 10:44 AM, Jason <thesurprises at gmail.com>
wrote:
>
>> Hi, I am using mcjit in llvm 3.6 to jit kernels to x86 avx2. I've
noticed
>> some inefficient use of the stack around constant vectors. In one
example,
>> I have code that computes a series of constant vectors at compile time.
>> Each vector has a single use. In the final asm, I see a series of
spills at
>> the top of the function of all the constant vectors immediately to
stack,
>> then each use references the stack pointer directly:
>>
>> Lots of these at top of function:
>>
>> movabsq $.LCPI0_212, %rbx
>> vmovaps (%rbx), %ymm0
>> vmovaps %ymm0, 2816(%rsp)       # 32-byte Spill
>>
>> Later on, each use references the stack pointer:
>>
>> vpaddd 2816(%rsp), %ymm4, %ymm1 # 32-byte Folded Reload
>>
>> It seems the spill to stack is unnecessary. In one particularly bad
>> kernel, I have 128 8-wide constant vectors, and so there is 4KB of
stack
>> use just for these constants. I think a better approach could be to
load
>> the constant vector pointers as needed:
>>
>> movabsq $.LCPI0_212, %rbx
>> vpaddd (%rbx), %ymm4, %ymm1
>>
>>
>> Thanks,
>> Jason
>>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160509/31f999e0/attachment.html>

Quentin Colombet via llvm-dev

2016-May-09 22:09 UTC

head link

[llvm-dev] Unnecessary spill/fill issue

Hi Jason,

I am guessing that the problem is that we do not recognize the sequence as
rematerializable because, we do not directly load LCPI0_212 into a ymm register.
One way to fix that is by using a pseudo instruction that does the load from the
constant to ymm (while defining a dead GPR register to be able to expand the
pseudo), then teach the folding code how to deal with that.

Another option is to make the rematerialization smarter, but that is more
complicated :).

Cheers,
-Quentin > On May 9, 2016, at 2:41 PM, Jason via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> Does anyone have any insight into this problem? Is there a way to minimize
excessive spill/fill for this kind of scenario?
> Thanks,
> Jason
> 
> 
> On Fri, May 6, 2016 at 10:44 AM, Jason <thesurprises at gmail.com
<mailto:thesurprises at gmail.com>> wrote:
> Hi, I am using mcjit in llvm 3.6 to jit kernels to x86 avx2. I've
noticed some inefficient use of the stack around constant vectors. In one
example, I have code that computes a series of constant vectors at compile time.
Each vector has a single use. In the final asm, I see a series of spills at the
top of the function of all the constant vectors immediately to stack, then each
use references the stack pointer directly:
> 
> Lots of these at top of function:
> 
> 	movabsq	$.LCPI0_212, %rbx
> 	vmovaps	(%rbx), %ymm0
> 	vmovaps	%ymm0, 2816(%rsp)       # 32-byte Spill
> 
> Later on, each use references the stack pointer:
> 
> 	vpaddd	2816(%rsp), %ymm4, %ymm1 # 32-byte Folded Reload
> 
> It seems the spill to stack is unnecessary. In one particularly bad kernel,
I have 128 8-wide constant vectors, and so there is 4KB of stack use just for
these constants. I think a better approach could be to load the constant vector
pointers as needed:
> 
> 	movabsq	$.LCPI0_212, %rbx
> 	vpaddd	(%rbx), %ymm4, %ymm1
> 
> 
> Thanks,
> Jason
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160509/04f11585/attachment.html>

Reasonably Related Threads

Search for more reasonably related threads

llvm dev - May 2016 - Unnecessary spill/fill issue

[llvm-dev] Unnecessary spill/fill issue

[llvm-dev] Unnecessary spill/fill issue

[llvm-dev] Unnecessary spill/fill issue

[llvm-dev] Unnecessary spill/fill issue

Reasonably Related Threads