I'm trying the following code on X86:
define <3 x i32> @retThree( ) {
ret <3 x i32> <i32 1, i32 2, i32 3 >
}
expecting it to load the three first lanes of %xmm0. (If returning a
vector of four, %xmm0 is used). But the generated assembly seems to be
using the method of return by hidden pointer. This despite that the
generated assembly seems to have allocated the vector with padding
preparing for this:
.LCPI1_0: # constant pool <4 x i32>
.long 1 # 0x1
.long 2 # 0x2
.long 3 # 0x3
.zero 4
.text
Should this data not be loaded as is into %xmm0 like in the case of
vector of four?:
retFour: # @retFour
.Leh_func_begin1:
# BB#0:
movaps .LCPI1_0, %xmm0
ret
This would of course leave the responsibility of ignoring the 4th lane
to the caller.
Debugging the code generation, I notice that the v3i32 is widened to
v4i32, but when X86TargetLowering::CanLowerReturn, the v3i32 is seems to
be split up into three MVT::i32s. If trying with a function that returns
a vector of 2 or 4, CanLowerReturn-function gets a MVT::v2i32 or
MVT::v4i32, respectively and return by pointer is not used.
Where is the v3i32, widened to v4i32, split up into (three?) separate i32s?
And BTW, I see similar behaviour on the SPU back end.
thanks,
kalle