> > <32 x float> takes up 8 SSE registers; you're likely running into > issues with register pressure. Does it work better if you use > something smaller like <4 x float>? > > Besides that, I don't see any obvious issues. > > -EliYou are right yes. The code works faster with <4 x float> types, with still works a bit slower than the scalar version. Stéphane Letz
On Sat, May 29, 2010 at 1:23 AM, Stéphane Letz <letz at grame.fr> wrote:>> >> <32 x float> takes up 8 SSE registers; you're likely running into >> issues with register pressure. Does it work better if you use >> something smaller like <4 x float>? >> >> Besides that, I don't see any obvious issues. >> >> -Eli > > > You are right yes. The code works faster with <4 x float> types, with still works a bit slower than the scalar version. > > Stéphane LetzHuh, that's strange... umm, possibly a stupid question, but are both versions doing the same amount of work? (It doesn't look like the vector version adjusts the loop count in the given code). Besides that, I can't think of any reason why the vector version would be slower except possibly memory bandwidth issues. -Eli
Le 29 mai 2010 à 10:40, Eli Friedman a écrit :> On Sat, May 29, 2010 at 1:23 AM, Stéphane Letz <letz at grame.fr> wrote: >>> >>> <32 x float> takes up 8 SSE registers; you're likely running into >>> issues with register pressure. Does it work better if you use >>> something smaller like <4 x float>? >>> >>> Besides that, I don't see any obvious issues. >>> >>> -Eli >> >> >> You are right yes. The code works faster with <4 x float> types, with still works a bit slower than the scalar version. >> >> Stéphane Letz > > Huh, that's strange... umm, possibly a stupid question, but are both > versions doing the same amount of work? (It doesn't look like the > vector version adjusts the loop count in the given code).Yes, right. This was incorrect, fixed now.> > Besides that, I can't think of any reason why the vector version would > be slower except possibly memory bandwidth issues. > > -EliNow it starts to be comparable. Thanks Stéphane Letz