Hi all, I believe I've found a bug in the encoding of the movd instruction on x64. Here's some IR code to reproduce it: external global i8*, align 1 ; <i8**>:0 [#uses=1] external global i8*, align 16 ; <i8**>:1 [#uses=1] declare void @abort() define internal void @2() { %1 = load i8** @0, align 1 ; <i8*> [#uses=1] %2 = load i8** @1, align 16 ; <i8*> [#uses=1] %3 = bitcast i8* %2 to <8 x i8>* ; <<8 x i8>*> [#uses=1] %4 = load <8 x i8>* %3, align 1 ; <<8 x i8>> [#uses=1] %5 = bitcast <8 x i8> %4 to i64 ; <i64> [#uses=1] %6 = trunc i64 %5 to i32 ; <i32> [#uses=1] %7 = bitcast i8* %1 to i32* ; <i32*> [#uses=1] store i32 %6, i32* %7, align 1 ret void } It generates the following code: mov rax,1E417A8h mov rax,qword ptr [rax] mov rcx,1E417B0h mov rcx,qword ptr [rcx] movq mm0,mmword ptr [rcx] movq rax,mm1 mov dword ptr [rax],ecx Note the last movq. What was probably intended to be generated was "movd ecx, mm0". LLVM mistakenly sets the 'wide' bit of the REX prefix to 1, turning movd into movq. Also, reg and r/m encoding has been swapped. I have not found a fix for this yet, but the MMX_MOVD64* definitions and patterns looks suspicious. Note that on x86-32 it produces correct code (though not optimal either; it doesn't use movd). Also, notice that the last two instructions above should ideally just be a single movd to memory, instead of first writing to a GP register. In the same breath, I believe the encoding of MMX_MOVDQ2Qrr is incorrect. I've been able to fix the first two definitions by using MRMSrcReg (and set hasNoSideEffects). I'm not sure about the third definition though, is this for 3DNow! And should it use MRMSrcReg as well? Thanks, Nicolas -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20090709/f85af5c0/attachment.html>
On Thu, Jul 9, 2009 at 8:44 AM, Nicolas Capens<nicolas at capens.net> wrote:> I believe I’ve found a bug in the encoding of the movd instruction on x64. > Here’s some IR code to reproduce it:[snip> Note the last movq. What was probably intended to be generated was “movd > ecx, mm0”. LLVM mistakenly sets the ‘wide’ bit of the REX prefix to 1, > turning movd into movq. Also, reg and r/m encoding has been swapped. I have > not found a fix for this yet, but the MMX_MOVD64* definitions and patterns > looks suspicious.Thanks for the testcase; fixed in r75142.> Note that on x86-32 it produces correct code (though not optimal either; it > doesn’t use movd). Also, notice that the last two instructions above should > ideally just be a single movd to memory, instead of first writing to a GP > register.The suboptimal code on x86-32 is because there aren't patterns for i32 extract from <2 x i32> yet (so it gets expanded through memory). The suboptimal code on x86-64 is due to a missing dagcombine; it doesn't know that extracting an i64 and truncating it to an i32 is equivalent to extracting an i32.> In the same breath, I believe the encoding of MMX_MOVDQ2Qrr is incorrect. > I’ve been able to fix the first two definitions by using MRMSrcReg (and set > hasNoSideEffects). I’m not sure about the third definition though, is this > for 3DNow! And should it use MRMSrcReg as well?Also fixed in r75142. Note that MMX_MOVQ2FR64rr is actually exactly the same instruction as MMX_MOVQ2DQrr; they're separated because it's a bit more straightforward to write that way (FR64 and VR128 are both XMM registers, but TableGen isn't aware of the precise relationship). -Eli
Intel syntax is extremely confusing. I often can't figure out when to use movd or movq. Evan On Jul 9, 2009, at 10:12 AM, Eli Friedman wrote:> On Thu, Jul 9, 2009 at 8:44 AM, Nicolas Capens<nicolas at capens.net> > wrote: >> I believe I’ve found a bug in the encoding of the movd instruction >> on x64. >> Here’s some IR code to reproduce it: > [snip >> Note the last movq. What was probably intended to be generated was >> “movd >> ecx, mm0”. LLVM mistakenly sets the ‘wide’ bit of the REX prefix to >> 1, >> turning movd into movq. Also, reg and r/m encoding has been >> swapped. I have >> not found a fix for this yet, but the MMX_MOVD64* definitions and >> patterns >> looks suspicious. > > Thanks for the testcase; fixed in r75142. > >> Note that on x86-32 it produces correct code (though not optimal >> either; it >> doesn’t use movd). Also, notice that the last two instructions >> above should >> ideally just be a single movd to memory, instead of first writing >> to a GP >> register. > > The suboptimal code on x86-32 is because there aren't patterns for i32 > extract from <2 x i32> yet (so it gets expanded through memory). The > suboptimal code on x86-64 is due to a missing dagcombine; it doesn't > know that extracting an i64 and truncating it to an i32 is equivalent > to extracting an i32. > >> In the same breath, I believe the encoding of MMX_MOVDQ2Qrr is >> incorrect. >> I’ve been able to fix the first two definitions by using MRMSrcReg >> (and set >> hasNoSideEffects). I’m not sure about the third definition though, >> is this >> for 3DNow! And should it use MRMSrcReg as well? > > Also fixed in r75142. Note that MMX_MOVQ2FR64rr is actually exactly > the same instruction as MMX_MOVQ2DQrr; they're separated because it's > a bit more straightforward to write that way (FR64 and VR128 are both > XMM registers, but TableGen isn't aware of the precise relationship). > > -Eli > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev