Jean-Marc Valin
2006-Jan-06 02:08 UTC
[Speex-dev] Re: sigsegv in _mm_load_ups (linux/gcc 3.x)
> I've seen the exact same in my version (mingw on win32), and the problem > was that the stack was misaligned when entering the function, so the temp > registers weren't at 16-byte boundries.That's a possibility. It's easy to check by printing the address of the variables. I know that gcc 3.3 had some alignment issues with _m128 that were supposed to be fixed in version 3.4 and above. Jean-Marc
Thorvald Natvig
2006-Jan-06 03:18 UTC
[Speex-dev] Re: sigsegv in _mm_load_ups (linux/gcc 3.x)
>> I've seen the exact same in my version (mingw on win32), and the problem >> was that the stack was misaligned when entering the function, so the temp >> registers weren't at 16-byte boundries. > > That's a possibility. It's easy to check by printing the address of the > variables. I know that gcc 3.3 had some alignment issues with _m128 that > were supposed to be fixed in version 3.4 and above.I just checked it in the debugger, and this was with gcc 3.4.4 (mingw)... And the addresses were not properly aligned :( From a bit of googling, this seems to be a thread problem, as the gcc just maintains 16-byte alignment of the stack -- if the start function of the thread had misaligned stack, the misalignment will be kept throughout the execution.
Thorvald, re: At 03:18 AM 1/6/2006, Thorvald Natvig wrote:>I just checked it in the debugger, and this was with gcc 3.4.4 (mingw)... >And the addresses were not properly aligned :( From a bit of googling, >this seems to be a thread problem, as the gcc just maintains 16-byte >alignment of the stack -- if the start function of the thread had >misaligned stack, the misalignment will be kept throughout the execution.Thanks! This is exactly the issue- that, and the fact that gcc seems to be issuing a movaps call that copies into unaligned memory right after loading the xmm register. No matter how we put the code together it always issues the movaps (probably because it needs more than 8 registers)- I am sure this code would work fine on a machine with 16 xmm registers. For now we will just use the c code for the unaligned _10 call. Probably this could be fixed also by hand coding this particular chunk of assembler. Tom