Hi all, I was wondering how to use variations of intrinsic functions that take a memory operand. Take for example the SSE4.1 pmovsxbd instruction. One variant takes two XMM registers, while another has a 32-bit memory location as source operand. The latter is quite interesting if you know you're reading from memory anyway, and if it's not 16-byte aligned. It looks like LLVM's Intrinsic::x86_sse41_pmovsxbd expects a v16i8 as source operand though. So how do I achieve using the variant taking a memory operand? Thanks a bunch, Nicolas Capens -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20080801/721d443e/attachment.html>
I tried adding the following to IntrinsicsX86.td: def int_x86_sse41_pmovsxbd_m : GCCBuiltin<"__builtin_ia32_pmovsxbd128_m">, Intrinsic<[llvm_v4i32_ty, llvm_ptr_ty], [IntrReadMem]>; But while I now have a Intrinsic::x86_sse41_pmovsxbd_m that I can use for 'calling' the intrinsic, I'm getting a "cannot yet select" assert. Any clues highly appreciated. From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Nicolas Capens Sent: Friday, 01 August, 2008 09:11 To: 'LLVM Developers Mailing List' Subject: [LLVMdev] Using intrinsics with memory operands Hi all, I was wondering how to use variations of intrinsic functions that take a memory operand. Take for example the SSE4.1 pmovsxbd instruction. One variant takes two XMM registers, while another has a 32-bit memory location as source operand. The latter is quite interesting if you know you're reading from memory anyway, and if it's not 16-byte aligned. It looks like LLVM's Intrinsic::x86_sse41_pmovsxbd expects a v16i8 as source operand though. So how do I achieve using the variant taking a memory operand? Thanks a bunch, Nicolas Capens -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20080801/75ae457d/attachment.html>
On Fri, Aug 1, 2008 at 12:10 AM, Nicolas Capens <nicolas at capens.net> wrote:> I was wondering how to use variations of intrinsic functions that take a > memory operand.Often, for intrinsics where it matters, there's a variant of the intrinsic that takes a pointer operand that you can use, although it looks like there isn't one here.> Take for example the SSE4.1 pmovsxbd instruction. One variant takes two XMM > registers, while another has a 32-bit memory location as source operand. The > latter is quite interesting if you know you're reading from memory anyway, > and if it's not 16-byte aligned. It looks like LLVM's > Intrinsic::x86_sse41_pmovsxbd expects a v16i8 as source operand though. So > how do I achieve using the variant taking a memory operand?A load+insertelement+pmovsx sequence should codegen into a single instruction, but it looks like that isn't working. I guess the pattern-matching magic should kick in and take care of this, but that doesn't seem to be working for a simple example like the following: target datalayout "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:32:32" target triple = "i386-pc-linux-gnu" define <4 x i32> @a(i32* %x) nounwind { entry: load i32* %x, align 4 ; <i32>:0 [#uses=1] insertelement <4 x i32> undef, i32 %0, i32 0 ; <<4 x i32>>:1 [#uses=1] bitcast <4 x i32> %1 to <16 x i8> ; <<16 x i8>>:5 [#uses=1] tail call <4 x i32> @llvm.x86.sse41.pmovsxbd( <16 x i8> %2 ) nounwind readnone ; <<2 x i64>>:6 [#uses=1] ret <4 x i32> %3 } declare <4 x i32> @llvm.x86.sse41.pmovsxbd(<16 x i8>) nounwind readnone I think the issue is that the pattern for the memory operand of pmovsxbd isn't flexible enough to see through the scalar_to_vector step. -Eli
Eli is correct. This is a deficiency in the matching code. We don't want variants of intrinsics which take memory operands. We often have to add code matching scalar_to_vector and / or bit_convert explicitly. Perhaps we should have tablegen produce matching code that check for these nodes. Evan On Aug 1, 2008, at 6:20 AM, Eli Friedman wrote:> On Fri, Aug 1, 2008 at 12:10 AM, Nicolas Capens <nicolas at capens.net> > wrote: >> I was wondering how to use variations of intrinsic functions that >> take a >> memory operand. > > Often, for intrinsics where it matters, there's a variant of the > intrinsic that takes a pointer operand that you can use, although it > looks like there isn't one here. > >> Take for example the SSE4.1 pmovsxbd instruction. One variant takes >> two XMM >> registers, while another has a 32-bit memory location as source >> operand. The >> latter is quite interesting if you know you're reading from memory >> anyway, >> and if it's not 16-byte aligned. It looks like LLVM's >> Intrinsic::x86_sse41_pmovsxbd expects a v16i8 as source operand >> though. So >> how do I achieve using the variant taking a memory operand? > > A load+insertelement+pmovsx sequence should codegen into a single > instruction, but it looks like that isn't working. I guess the > pattern-matching magic should kick in and take care of this, but that > doesn't seem to be working for a simple example like the following: > > target datalayout > "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32- > f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:32:32" > target triple = "i386-pc-linux-gnu" > > define <4 x i32> @a(i32* %x) nounwind { > entry: > load i32* %x, align 4 ; <i32>:0 [#uses=1] > insertelement <4 x i32> undef, i32 %0, i32 0 ; <<4 x i32>>:1 > [#uses=1] > bitcast <4 x i32> %1 to <16 x i8> ; <<16 x i8>>:5 [#uses=1] > tail call <4 x i32> @llvm.x86.sse41.pmovsxbd( <16 x i8> %2 ) nounwind > readnone ; <<2 x i64>>:6 [#uses=1] > ret <4 x i32> %3 > } > > declare <4 x i32> @llvm.x86.sse41.pmovsxbd(<16 x i8>) nounwind > readnone > > I think the issue is that the pattern for the memory operand of > pmovsxbd isn't flexible enough to see through the scalar_to_vector > step. > > -Eli > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev