When targeting modern processors, it will usually be best to use “BT reg, imm”
for testing bits 32-63. The alternative implementations will all use multiple
instructions, will encode larger, and will usually run slower. For testing bits
0-31, “TEST reg, imm” is preferred unless you are looking to minimize code size
at the expense of performance, in which case you would still want to use BT in
the cases where it encodes smaller. As Fiona pointed out, there are some
processors where TEST has slightly better low level performance properties than
BT.
Regarding the partial EFLAGS write, modern OOO processors independently rename
the carry flag, et al, so this is no longer a problem. I would have to check
with the processor architects to figure out the exact processor generation where
this problem was first fixed, but it was roughly a decade ago. Steve’s Agner
Fog quote, “BT, BTC, BTR, and BTS change the carry flag but leave the other
flags unchanged. This causes a false dependence on the previous value of the
flags and costs an extra μop. Use TEST, AND, OR and XOR instead of these
instructions.”, was in reference to the Pentium 4.
FWIW, the Intel compiler itself doesn’t quite get all the “test bit” sequences
right either. We will use a 64-bit TEST for testing bits 0-30, but we ought to
be using a 32-bit test to avoid the REX byte. Similarly, for testing bit 31, we
should use a 32-bit TEST rather than the “BT reg, 31” that we will currently
generate. I intend to get those cases fixed.
Also, this thread hasn’t been focusing on the “BT reg, reg”, and “BT[CSR] reg,
reg” instruction forms, but these are good instructions to use where possible.
The only instructions in the BT family that you really want to avoid at all
costs are the memory forms. You never want to generate those. The
multi-instruction expansions will almost always be faster.
David Kreitzer
IA-32/Intel64 Code Generation
Intel Compilers
From: Smith, Kevin B
Sent: Friday, January 23, 2015 1:03 PM
To: Chris Sears; Stephen Canon
Cc: LLVM Developers Mailing List; Kreitzer, David L
Subject: RE: [LLVMdev] X86TarIgetLowering::LowerToBT
I’ll be happy to run it for you. Do you want Intel64, x86 or both? The Intel
compiler doesn’t have a –Oz option. It has –Os and –O[123].
Also, FWIW, one of the Intel compiler experts on BT will comment on this thread,
and on our rules for BT usage later this afternoon.
Kevin B. Smith
From: llvmdev-bounces at cs.uiuc.edu<mailto:llvmdev-bounces at
cs.uiuc.edu> [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Chris Sears
Sent: Friday, January 23, 2015 9:37 AM
To: Stephen Canon
Cc: LLVM Developers Mailing List
Subject: Re: [LLVMdev] X86TargetLowering::LowerToBT
Constant mask case.
Sanjay, could you run this through the Intel compiler with the appropriate
flags?
They have an -O2 but I couldn't find an equivalent -Oz.
For LLVM, it generates BTQ for testing bits 32-63.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150123/7b54d5d2/attachment.html>