thr3ads.net - llvm dev - [LLVMdev] X86TargetLowering::LowerToBT [Jan 2015]

If this information is useful, please help other people find it:
Share via:

Pete Cooper

2015-Jan-22 22:00 UTC

[LLVMdev] X86TargetLowering::LowerToBT

> On Jan 22, 2015, at 1:22 PM, Fiona Glaser <fglaser at apple.com>
wrote:
> 
> According to Agner’s docs, many CPUs have slower BT than TEST; Haswell has
only 0.5 inverse throughput as opposed to 0.25, Atom has 1 instead of 0.5, and
Silvermont can’t even dual-issue BT (it locks both ALUs). So while BT does seem
have a shorter instruction encoding than TEST for TEST reg, imm32 where imm32
has one bit set, it might not be the best idea to always change TEST reg, 0x1000
to BT reg, 12…Sounds like we should use BT with -Os, but TEST otherwise.  This is probably a
common enough instruction that it might make a good impact on code size.

Pete> 
> Fiona
> 
>> On Jan 22, 2015, at 1:17 PM, Mehdi Amini <mehdi.amini at apple.com
<mailto:mehdi.amini at apple.com>> wrote:
>> 
>> 
>> 
>>> Begin forwarded message:
>>> 
>>> Date: January 18, 2015 at 10:57:33 PM PST
>>> Subject: Re: [LLVMdev] X86TargetLowering::LowerToBT
>>> From: Chris Sears <chris.sears at gmail.com
<mailto:chris.sears at gmail.com>>
>>> To: Mehdi Amini <mehdi.amini at apple.com <mailto:mehdi.amini
at apple.com>>
>>> Cc: LLVM Developers Mailing List <llvmdev at cs.uiuc.edu
<mailto:llvmdev at cs.uiuc.edu>>
>>> 
>>> Sure. Attached is the file but here are the functions. The first
uses a fixed bit offset. The second has a indexed bit offset. Compiling with llc
-O3, LLVM version 3.7.0svn, it compiles the IR from IsBitSetB() using btq %rsi,
%rdi. Good. But then it compiles IsBitSetA() with shrq/andq, which is is pretty
much what Clang had generated as IR.
>>> 
>>> shrq	$25, %rdi
>>> andq $1, %rdi
>>> 
>>> LLVM should be able to replace these two with a single X86_64
instruction: btq reg,25
>>> The generated code is correct in both cases. It just isn't
optimized in the immediate operatnd case.
>>> 
>>> unsigned long long IsBitSetA(unsigned long long val)
>>> {
>>>     return (val & (1ULL<<25)) != 0ULL;
>>> }
>>> 
>>> unsigned long long IsBitSetB(unsigned long long val, int index)
>>> {
>>>     return (val & (1ULL<<index)) != 0ULL;
>>> }
>>> 
>>> 
>>> On Sun, Jan 18, 2015 at 10:02 PM, Mehdi Amini <mehdi.amini at
apple.com <mailto:mehdi.amini at apple.com>> wrote:
>>> Hi,
>>> 
>>> Can you provide a reproducible example? I feel especially your
first IR sample is incomplete.
>>> If you can also make more explicit how is the generated code wrong?
>>> 
>>> You can give a C file if you are sure that it is reproducible with
the current clang.
>>> 
>>> Thanks,
>>> 
>>> Mehdi
>>> 
>>>> On Jan 18, 2015, at 5:13 PM, Chris Sears <chris.sears at
gmail.com <mailto:chris.sears at gmail.com>> wrote:
>>>> 
>>>> I'm tracking down an X86 code generation malfeasance
regarding BT (bit test) and I have some questions.
>>>> 
>>>> This IR matches and then X86TargetLowering::LowerToBT is
called:
>>>> 
>>>> %and = and i64 %shl, %val      ; (val & (1 << index))
!= 0     ; bit test with a register index
>>>> 
>>>> This IR does not match and so X86TargetLowering::LowerToBT is
not called:
>>>> 
>>>> %and = lshr i64 %val, 25          ; (val & (1 << 25))
!= 0          ; bit test with an immediate index
>>>> %conv = and i64 %and, 1
>>>> 
>>>> Let's back that up a bit. Clang emits this IR. These
expressions start out life in C as and with a left shifted masking bit, and are
then converted into IR as right shifted values anded with a masking bit.
>>>> 
>>>> This IR then remains untouched until Expand ISel
Pseudo-instructions in llc (-O3). At that point, LowerToBT is called on the
REGISTER version and substitutes in a BT reg,reg instruction:
>>>> 
>>>> btq	%rsi, %rdi                          ## <MCInst #312
BT64rr
>>>> 
>>>> The IMMEDIATE version doesn't match the pattern and so
LowerToBT is not called.
>>>> 
>>>> Question: This is during pseudo instruction expansion. How
could LowerToBT's caller have enough context to match the immediate IR
version? In fact, lli isn't calling LowerToBT so it isn't matching. But
isn't this really a peephole optimization issue?
>>>> 
>>>> LLVM has a generic peephole optimizer,
CodeGen/PeepholeOptimizer.cpp which has exactly one subclass in
NVPTXTargetMachine.cpp.
>>>> 
>>>> But isn't it better to deal with X86 LowerToBT in a
PeepholeOptimizer subclass where you have a small window of instructions rather
than during pseudo instruction expansion where you have really one instruction?
PeepholeOptimizer doesn't seem to be getting much attention and certainly no
attention at the subclass level.
>>>> 
>>>> Bluntly, expansion is about expansion. Peephole optimization is
the opposite.
>>>> 
>>>> Question: Regardless, why is LowerToBT not being called for the
IMMEDIATE version? I suppose you could look at the preceding instruction in the
DAG. That seems a bit hacky.
>>>> 
>>>> Another approach using LowerToBT would be to match lshr reg/imm
first and then if the following instruction was an and reg,1 replace both with a
BT. It doesn't look like LowerToBT as is can do that right now since it is
matching the and instruction.
>>>> 
>>>> SDValue X86TargetLowering::LowerToBT(SDValue And, ISD::CondCode
CC, SDLoc dl, SelectionDAG &DAG) const { ... }
>>>> 
>>>> But I think this is better done in a subclass of
CodeGen/PeepholeOptimizer.cpp.
>>>> 
>>>> thanks.
>>>> 
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>   
http://llvm.cs.uiuc.edu <http://llvm.cs.uiuc.edu/>
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
<http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>
>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> Ite Ursi
>> <tst.c>
>> 
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150122/ea96b19c/attachment.html>

Chris Sears

2015-Jan-22 22:05 UTC

head link

[LLVMdev] X86TargetLowering::LowerToBT

I think the partial update issue isn't really valid concern, Agner Fogg, p
142. I don't think LLVM is going to emit this fragment.

*; Example 10.7. Partial register access*
*bt eax,2 ; modifies carry flag but not zero flag*
*cmovbe eax,ebx ; reads both carry flag and zero flag *

In cases like this, you may consider whether it is a programming error or a
deliberate testing of two different conditions with a single instruction.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150122/dc625b7c/attachment.html>

Stephen Canon

2015-Jan-22 22:16 UTC

head link

[LLVMdev] X86TargetLowering::LowerToBT

That’s not how partial-flags update stalls work.  There is no independent
tracking of individual bits in EFLAGS.  This means that BT + CMOVNZ has a false
dependency on whatever instruction wrote to EFLAGS before BT and requires an
extra µop vis-a-vis TEST + CMOVNZ or SHR + AND.

Please do not use BT.  It is a performance hazard.  If you don’t believe me for
some reason, here’s the relevant quote from Agner:

"BT, BTC, BTR, and BTS change the carry flag but leave the other flags
unchanged. This causes a false dependence on the previous value of the flags and
costs an extra μop. Use TEST, AND, OR and XOR instead of these instructions.”

– Steve
> On Jan 22, 2015, at 5:05 PM, Chris Sears <chris.sears at gmail.com>
wrote:
> 
> I think the partial update issue isn't really valid concern, Agner
Fogg, p 142. I don't think LLVM is going to emit this fragment.
> 
> ; Example 10.7. Partial register access
> bt eax,2 ; modifies carry flag but not zero flag
> cmovbe eax,ebx ; reads both carry flag and zero flag
> 
> In cases like this, you may consider whether it is a programming error or a
deliberate testing of two different conditions with a single instruction.
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Seemingly Similar Threads

Search for more seemingly similar threads

llvm dev - Jan 2015 - [LLVMdev] X86TargetLowering::LowerToBT

[LLVMdev] X86TargetLowering::LowerToBT

[LLVMdev] X86TargetLowering::LowerToBT

[LLVMdev] X86TargetLowering::LowerToBT

Seemingly Similar Threads