Erik de Castro Lopo
2011-Sep-27 11:33 UTC
[LLVMdev] Poor code generation for odd sized vectors
Hi all, I'm compiling LLCM IR code like this on x86-64: define linkonce ccc <16 x float> @vector_add_float(<16 x float> %a.78, <16 x float> %a.79) align 8 { entry: %result.80 = fadd <16 x float> %a.78, %a.79 ret <18 x float> %result.80 } This works really well when the vector length (16 in the above) is an integer multiple of the SSE vector register width (4) resulting in the following assember code: vector_add_float: # @vector_add_float .Leh_func_begin0: # BB#0: # %entry addps %xmm4, %xmm0 addps %xmm5, %xmm1 addps %xmm6, %xmm2 addps %xmm7, %xmm3 ret However, when the vector length is increased to say 18, the generated code is rather poor, or rather is code that could easily be improved by hand. Is this a know issue? Should LLVM be doing better? SHould I raise a bug? Cheers, Erik -- ---------------------------------------------------------------------- Erik de Castro Lopo http://www.mega-nerd.com/
Hi Erik, this is a bug, please open a bugreport. The LLVM type legalizer has a bunch of code to handle this (vector widening), but unfortunately the argument lowering code seems to have scalarized the function arguments before getting to the type legalizer. Compare with the code produced for the following: define void @vector_add_float(<18 x float> *%rp, <18 x float> *%a.78p, <18 x float> *%a.79p) { %a.78 = load <18 x float> *%a.78p %a.79 = load <18 x float> *%a.79p %result.80 = fadd <18 x float> %a.78, %a.79 store <18 x float> %result.80, <18 x float> *%rp ret void } Ciao, Duncan.> I'm compiling LLCM IR code like this on x86-64: > > define linkonce ccc<16 x float> @vector_add_float(<16 x float> %a.78,<16 x float> %a.79) align 8 > { > entry: > %result.80 = fadd<16 x float> %a.78, %a.79 > ret<18 x float> %result.80 > } > > This works really well when the vector length (16 in the above) is > an integer multiple of the SSE vector register width (4) resulting > in the following assember code: > > vector_add_float: # @vector_add_float > .Leh_func_begin0: > # BB#0: # %entry > addps %xmm4, %xmm0 > addps %xmm5, %xmm1 > addps %xmm6, %xmm2 > addps %xmm7, %xmm3 > ret > > However, when the vector length is increased to say 18, the generated > code is rather poor, or rather is code that could easily be improved > by hand. > > Is this a know issue? Should LLVM be doing better? SHould I raise a > bug? > > Cheers, > Erik
Erik de Castro Lopo
2011-Sep-27 12:23 UTC
[LLVMdev] Poor code generation for odd sized vectors
Duncan Sands wrote:> Hi Erik, this is a bug, please open a bugreport.Thanks. Bug report here: http://llvm.org/bugs/show_bug.cgi?id=11023 Erik -- ---------------------------------------------------------------------- Erik de Castro Lopo http://www.mega-nerd.com/
Maybe Matching Threads
- [LLVMdev] Poor code generation for odd sized vectors
- New routine: FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_16
- [PATCH] Make SSE Run Time option. Add Win32 SSE code
- [LLVMdev] Excessive register spilling in large automatically generated functions, such as is found in FFTW
- [LLVMdev] Excessive register spilling in large automatically generated functions, such as is found in FFTW