search for: lowertobts

Displaying 14 results from an estimated 14 matches for "lowertobts".

Did you mean: lowertobt
2015 Jan 19
6
[LLVMdev] X86TargetLowering::LowerToBT
I'm tracking down an X86 code generation malfeasance regarding BT (bit test) and I have some questions. This IR *matches* and then *X86TargetLowering::LowerToBT **is called:* %and = and i64 %shl, %val * ; (val & (1 << index)) != 0 ; *bit test with a *register* index This IR *does not match* and so *X86TargetLowering::LowerToBT **is not called:* %and = lshr i64 %val, 25
2015 Jan 19
2
[LLVMdev] X86TargetLowering::LowerToBT
Sure. Attached is the file but here are the functions. The first uses a fixed bit offset. The second has a indexed bit offset. Compiling with llc -O3, LLVM version 3.7.0svn, it compiles the IR from IsBitSetB() using btq %rsi, %rdi. Good. But then it compiles IsBitSetA() with shrq/andq, which is is pretty much what Clang had generated as IR. shrq $25, %rdi andq $1, %rdi LLVM should be able to
2015 Jan 22
2
[LLVMdev] X86TargetLowering::LowerToBT
> On Jan 22, 2015, at 1:22 PM, Fiona Glaser <fglaser at apple.com> wrote: > > According to Agner’s docs, many CPUs have slower BT than TEST; Haswell has only 0.5 inverse throughput as opposed to 0.25, Atom has 1 instead of 0.5, and Silvermont can’t even dual-issue BT (it locks both ALUs). So while BT does seem have a shorter instruction encoding than TEST for TEST reg, imm32 where
2015 Jan 24
2
[LLVMdev] X86TargetLowering::LowerToBT
...ays used for bits 0-31 and BT was always used for bits 32-63. Since the BT instruction is 16b smaller than TEST for the bits 8-31 case, 32b vs 48b, and not irredeemably slower, it makes sense to use BT in cases where size matters. Similar logic is possible for BTC and BTS. However, LowerToBTC and LowerToBTS would need to be written and used and that's a larger patch. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150124/9f79c1dc/attachment.html> -------------- next part -------------- A non-text attachme...
2015 Jan 24
2
[LLVMdev] X86TargetLowering::LowerToBT
...s 32-63. > > > > Since the BT instruction is 16b smaller than TEST for the bits 8-31 > case, 32b vs 48b, and not irredeemably slower, it makes sense to use BT in > cases where size matters. > > > > Similar logic is possible for BTC and BTS. However, LowerToBTC and > LowerToBTS would need to be written and used and that's a larger patch. > > <pat><tst64.c>_______________________________________________ > > llvm-commits mailing list > > llvm-commits at cs.uiuc.edu > > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits > >...
2015 Jan 22
2
[LLVMdev] X86TargetLowering::LowerToBT
That’s not how partial-flags update stalls work. There is no independent tracking of individual bits in EFLAGS. This means that BT + CMOVNZ has a false dependency on whatever instruction wrote to EFLAGS before BT and requires an extra µop vis-a-vis TEST + CMOVNZ or SHR + AND. Please do not use BT. It is a performance hazard. If you don’t believe me for some reason, here’s the relevant quote
2015 Jan 23
3
[LLVMdev] X86TarIgetLowering::LowerToBT
I’ll be happy to run it for you. Do you want Intel64, x86 or both? The Intel compiler doesn’t have a –Oz option. It has –Os and –O[123]. Also, FWIW, one of the Intel compiler experts on BT will comment on this thread, and on our rules for BT usage later this afternoon. Kevin B. Smith From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Chris Sears Sent:
2015 Jan 22
3
[LLVMdev] X86TargetLowering::LowerToBT
Is that even a valid instruction? I thought TEST only took 32-bit immediates. Fiona > On Jan 22, 2015, at 2:48 PM, Chris Sears <chris.sears at gmail.com> wrote: > > The problem is that REX TEST reg,#(1<<37) is 10 bytes vs 5 bytes for REX BT reg,37. > That's a large space penalty to pay for a possible partial update stall. > > So the idea of generating BT for
2015 Jan 22
3
[LLVMdev] X86TargetLowering::LowerToBT
Yeah, the alternative is to do movabs and then test, which is doable but I’m not sure if it’s worth it (surely BT + risk of flags merging penalty has to be better than two ops, one of which is ~9-10 bytes). Fiona > On Jan 22, 2015, at 2:59 PM, Chris Sears <chris.sears at gmail.com> wrote: > > My bad on that. So that's what the comment meant. > That means BT is pretty much
2015 Jan 23
2
[LLVMdev] X86TarIgetLowering::LowerToBT
> icc generates testq for 0-30 and btq for 31-63. > That seems like a small bug in the bit 31 case. You can’t use testq for bit 31, because the immediate gets sign-extended. You *can* use the 32b form, of course.
2015 Jan 19
2
[LLVMdev] X86TargetLowering::LowerToBT
Which BTQ? There are three flavors. BTQ reg/reg BTQ reg/mem BTQ reg/imm I can imagine that the reg/reg and especially the reg/mem versions would be slow. However the shrq/and versions *with the same operands* would be slow as well. There's even a compiler comment about the reg/mem version saying "this is for disassembly only". But I doubt BTQ reg/imm would be microcoded. -- Ite
2015 Jan 22
2
[LLVMdev] X86TargetLowering::LowerToBT
On Thu Jan 22 2015 at 3:32:53 PM Chris Sears <chris.sears at gmail.com> wrote: > The status quo is: > > a) 40b REX+BT instruction for the 64b case > b) 48b TEST for the 32b case > c) unless it's small TEST > > > You are currently paying a 16b penalty for TEST vs BT in the 32b case. > That may be worth testing the -Os flag. > You'll want -Oz here, Os
2015 Jan 23
2
[LLVMdev] X86TargetLowering::LowerToBT
I suspect that this is because the mask in your example is the result of a variable shift, which (a) has it’s own performance and flags hazards pre-SHLX and (b) requires additional µops to do with TEST. I expect that ICC is putting a dummy TEST or XOR ahead of the BT to break the false flags dependency, as well. If the mask were constant, I expect ICC would generate TEST instead (but I don’t
2015 Feb 03
2
[LLVMdev] RFC: Constant Hoisting
I've had a bug/pessimization which I've tracked down for 1 bit bitmasks: if (((xx) & (1ULL << (40)))) return 1; if (!((yy) & (1ULL << (40)))) ... The second time Constant Hoisting sees the value (1<<40) it wraps it up with a bitcast. That value then gets hoisted. However, the first (1<<40) is not bitcast and gets recognized as a BT. The second