thr3ads.net - llvm dev - [LLVMdev] Poor code generation for odd sized vectors [Sep 2011]

If this information is useful, please help other people find it:
Share via:

Erik de Castro Lopo

2011-Sep-27 11:33 UTC

[LLVMdev] Poor code generation for odd sized vectors

Hi all,

I'm compiling LLCM IR code like this on x86-64:

  define linkonce ccc <16 x float> @vector_add_float(<16 x float> 
%a.78, <16 x float>  %a.79) align 8
  {
  entry:
    %result.80 = fadd <16 x float> %a.78, %a.79
    ret <18 x float> %result.80
  }

This works really well when the vector length (16 in the above) is
an integer multiple of the SSE vector register width (4) resulting
in the following assember code:

    vector_add_float:                       # @vector_add_float
    .Leh_func_begin0:
    # BB#0:                                 # %entry
	addps	%xmm4, %xmm0
	addps	%xmm5, %xmm1
	addps	%xmm6, %xmm2
	addps	%xmm7, %xmm3
	ret

However, when the vector length is increased to say 18, the generated
code is rather poor, or rather is code that could easily be improved
by hand.

Is this a know issue? Should LLVM be doing better? SHould I raise a
bug?

Cheers,
Erik
-- 
----------------------------------------------------------------------
Erik de Castro Lopo
http://www.mega-nerd.com/

Duncan Sands

2011-Sep-27 12:08 UTC

head link

[LLVMdev] Poor code generation for odd sized vectors

Hi Erik, this is a bug, please open a bugreport.  The LLVM type legalizer has a
bunch of code to handle this (vector widening), but unfortunately the argument
lowering code seems to have scalarized the function arguments before getting
to the type legalizer.  Compare with the code produced for the following:

define void @vector_add_float(<18 x float> *%rp, <18 x float> 
*%a.78p, <18 x
float>  *%a.79p) {
   %a.78 = load <18 x float> *%a.78p
   %a.79 = load <18 x float> *%a.79p
   %result.80 = fadd <18 x float> %a.78, %a.79
   store <18 x float> %result.80, <18 x float> *%rp
   ret void
}

Ciao, Duncan.
> I'm compiling LLCM IR code like this on x86-64:
>
>    define linkonce ccc<16 x float>  @vector_add_float(<16 x
float>   %a.78,<16 x float>   %a.79) align 8
>    {
>    entry:
>      %result.80 = fadd<16 x float>  %a.78, %a.79
>      ret<18 x float>  %result.80
>    }
>
> This works really well when the vector length (16 in the above) is
> an integer multiple of the SSE vector register width (4) resulting
> in the following assember code:
>
>      vector_add_float:                       # @vector_add_float
>      .Leh_func_begin0:
>      # BB#0:                                 # %entry
> 	addps	%xmm4, %xmm0
> 	addps	%xmm5, %xmm1
> 	addps	%xmm6, %xmm2
> 	addps	%xmm7, %xmm3
> 	ret
>
> However, when the vector length is increased to say 18, the generated
> code is rather poor, or rather is code that could easily be improved
> by hand.
>
> Is this a know issue? Should LLVM be doing better? SHould I raise a
> bug?
>
> Cheers,
> Erik

Erik de Castro Lopo

2011-Sep-27 12:23 UTC

head link

[LLVMdev] Poor code generation for odd sized vectors

Duncan Sands wrote:
> Hi Erik, this is a bug, please open a bugreport.
Thanks. Bug report here:

    http://llvm.org/bugs/show_bug.cgi?id=11023

Erik
-- 
----------------------------------------------------------------------
Erik de Castro Lopo
http://www.mega-nerd.com/

Possibly Parallel Threads

Search for more maybe matching threads

llvm dev - Sep 2011 - [LLVMdev] Poor code generation for odd sized vectors

[LLVMdev] Poor code generation for odd sized vectors

[LLVMdev] Poor code generation for odd sized vectors

[LLVMdev] Poor code generation for odd sized vectors

Possibly Parallel Threads