thr3ads.net - llvm dev - [LLVMdev] X86TargetLowering::LowerToBT [Jan 2015]

If this information is useful, please help other people find it:
Share via:

Chris Sears

2015-Jan-19 06:57 UTC

[LLVMdev] X86TargetLowering::LowerToBT

Sure. Attached is the file but here are the functions. The first uses a
fixed bit offset. The second has a indexed bit offset. Compiling with llc
-O3, LLVM version 3.7.0svn, it compiles the IR from IsBitSetB() using btq %rsi,
%rdi. Good. But then it compiles IsBitSetA() with shrq/andq, which is is
pretty much what Clang had generated as IR.

shrq $25, %rdi
andq $1, %rdi


LLVM should be able to replace these two with a single X86_64 instruction:
btq reg,25
The generated code is correct in both cases. It just isn't optimized in the
immediate operatnd case.

unsigned long long IsBitSetA(unsigned long long val)
{
    return (val & (1ULL<<25)) != 0ULL;
}

unsigned long long IsBitSetB(unsigned long long val, int index)
{
    return (val & (1ULL<<index)) != 0ULL;
}


On Sun, Jan 18, 2015 at 10:02 PM, Mehdi Amini <mehdi.amini at apple.com>
wrote:
> Hi,
>
> Can you provide a reproducible example? I feel especially your first IR
> sample is incomplete.
> If you can also make more explicit how is the generated code wrong?
>
> You can give a C file if you are sure that it is reproducible with the
> current clang.
>
> Thanks,
>
> Mehdi
>
> On Jan 18, 2015, at 5:13 PM, Chris Sears <chris.sears at gmail.com>
wrote:
>
> I'm tracking down an X86 code generation malfeasance regarding BT (bit
> test) and I have some questions.
>
> This IR *matches* and then *X86TargetLowering::LowerToBT **is called:*
>
> %and = and i64 %shl, %val     * ; (val & (1 << index)) != 0     ;
*bit
> test with a *register* index
>
>
> This IR *does not match* and so *X86TargetLowering::LowerToBT **is not
> called:*
>
> %and = lshr i64 %val, 25         * ; (val & (1 **<< 25)) != 0    
; *bit
> test with an *immediate* index
>
> %conv = and i64 %and, 1
>
>
> Let's back that up a bit. Clang emits this IR. These expressions start
out
> life in C as *and with a left shifted masking bit*, and are then
> converted into IR as *right shifted values anded with a masking bit*.
>
> This IR then remains untouched until *Expand ISel Pseudo-instructions* in
> llc (-O3). At that point, *LowerToBT* is called on the REGISTER version
> and substitutes in a BT reg,reg instruction:
>
> btq %rsi, %rdi                          ## <MCInst #312 BT64rr
>
>
> The IMMEDIATE version doesn't match the pattern and so *LowerToBT* is
not
> called.
>
> *Question*: This is during *pseudo instruction expansion*. How could
> *LowerToBT'*s caller have enough context to match the immediate IR
> version? In fact, lli isn't calling *LowerToBT* so it isn't
matching. But
> isn't this really a *peephole optimization* issue?
>
> LLVM has a generic peephole optimizer, *CodeGen/PeepholeOptimizer.cpp
*which has
> exactly one subclass in *NVPTXTargetMachine.cpp.*
>
> But isn't it better to deal with X86 *LowerToBT* in a
*PeepholeOptimizer* subclass
> where you have a small window of instructions rather than during pseudo
> instruction expansion where you have really one instruction?
> *PeepholeOptimizer *doesn't seem to be getting much attention and
> certainly no attention at the subclass level.
>
> Bluntly, expansion is about expansion. Peephole optimization is the
> opposite.
>
> *Question*: Regardless, why is *LowerToBT* not being called for the
> IMMEDIATE version? I suppose you could look at the preceding instruction in
> the DAG. That seems a bit hacky*.*
>
> Another approach using *LowerToBT* would be to match *lshr reg/imm* first
> and then if the *following* instruction was an *and reg,1 *replace both
> with a BT*. *It doesn't look like *LowerToBT* as is can do that right
now
> since it is matching the *and* instruction.
>
> SDValue X86TargetLowering::LowerToBT(*SDValue And*, ISD::CondCode CC,
> SDLoc dl, SelectionDAG &DAG) const { ... }
>
>
> But I think this is better done in a subclass of
> *CodeGen/PeepholeOptimizer.cpp.*
>
> thanks.
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
>

-- 
Ite Ursi
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150118/9927a6c4/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tst.c
Type: text/x-csrc
Size: 207 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150118/9927a6c4/attachment.c>

David Jones

2015-Jan-19 11:23 UTC

head link

[LLVMdev] X86TargetLowering::LowerToBT

Do we want to use btq?

On many x64_64 processors, shrq/andq is "hard-coded", but btq will
execute
in microcode, and will likely be worse performing.


On Mon, Jan 19, 2015 at 1:57 AM, Chris Sears <chris.sears at gmail.com>
wrote:
> Sure. Attached is the file but here are the functions. The first uses a
> fixed bit offset. The second has a indexed bit offset. Compiling with llc
> -O3, LLVM version 3.7.0svn, it compiles the IR from IsBitSetB() using btq
%rsi,
> %rdi. Good. But then it compiles IsBitSetA() with shrq/andq, which is is
> pretty much what Clang had generated as IR.
>
> shrq $25, %rdi
> andq $1, %rdi
>
>
> LLVM should be able to replace these two with a single X86_64 instruction:
> btq reg,25
> The generated code is correct in both cases. It just isn't optimized in
> the immediate operatnd case.
>
> unsigned long long IsBitSetA(unsigned long long val)
> {
>     return (val & (1ULL<<25)) != 0ULL;
> }
>
> unsigned long long IsBitSetB(unsigned long long val, int index)
> {
>     return (val & (1ULL<<index)) != 0ULL;
> }
>
>
> On Sun, Jan 18, 2015 at 10:02 PM, Mehdi Amini <mehdi.amini at
apple.com>
> wrote:
>
>> Hi,
>>
>> Can you provide a reproducible example? I feel especially your first IR
>> sample is incomplete.
>> If you can also make more explicit how is the generated code wrong?
>>
>> You can give a C file if you are sure that it is reproducible with the
>> current clang.
>>
>> Thanks,
>>
>> Mehdi
>>
>> On Jan 18, 2015, at 5:13 PM, Chris Sears <chris.sears at
gmail.com> wrote:
>>
>> I'm tracking down an X86 code generation malfeasance regarding BT
(bit
>> test) and I have some questions.
>>
>> This IR *matches* and then *X86TargetLowering::LowerToBT **is called:*
>>
>> %and = and i64 %shl, %val     * ; (val & (1 << index)) != 0  
; *bit
>> test with a *register* index
>>
>>
>> This IR *does not match* and so *X86TargetLowering::LowerToBT **is not
>> called:*
>>
>> %and = lshr i64 %val, 25         * ; (val & (1 **<< 25)) != 0
;
>> *bit test with an *immediate* index
>>
>> %conv = and i64 %and, 1
>>
>>
>> Let's back that up a bit. Clang emits this IR. These expressions
start
>> out life in C as *and with a left shifted masking bit*, and are then
>> converted into IR as *right shifted values anded with a masking bit*.
>>
>> This IR then remains untouched until *Expand ISel Pseudo-instructions*
>> in llc (-O3). At that point, *LowerToBT* is called on the REGISTER
>> version and substitutes in a BT reg,reg instruction:
>>
>> btq %rsi, %rdi                          ## <MCInst #312 BT64rr
>>
>>
>> The IMMEDIATE version doesn't match the pattern and so *LowerToBT*
is
>> not called.
>>
>> *Question*: This is during *pseudo instruction expansion*. How could
>> *LowerToBT'*s caller have enough context to match the immediate IR
>> version? In fact, lli isn't calling *LowerToBT* so it isn't
matching.
>> But isn't this really a *peephole optimization* issue?
>>
>> LLVM has a generic peephole optimizer, *CodeGen/PeepholeOptimizer.cpp
*which has
>> exactly one subclass in *NVPTXTargetMachine.cpp.*
>>
>> But isn't it better to deal with X86 *LowerToBT* in a
*PeepholeOptimizer* subclass
>> where you have a small window of instructions rather than during pseudo
>> instruction expansion where you have really one instruction?
>> *PeepholeOptimizer *doesn't seem to be getting much attention and
>> certainly no attention at the subclass level.
>>
>> Bluntly, expansion is about expansion. Peephole optimization is the
>> opposite.
>>
>> *Question*: Regardless, why is *LowerToBT* not being called for the
>> IMMEDIATE version? I suppose you could look at the preceding
instruction in
>> the DAG. That seems a bit hacky*.*
>>
>> Another approach using *LowerToBT* would be to match *lshr reg/imm*
first
>> and then if the *following* instruction was an *and reg,1 *replace both
>> with a BT*. *It doesn't look like *LowerToBT* as is can do that
right
>> now since it is matching the *and* instruction.
>>
>> SDValue X86TargetLowering::LowerToBT(*SDValue And*, ISD::CondCode CC,
>> SDLoc dl, SelectionDAG &DAG) const { ... }
>>
>>
>> But I think this is better done in a subclass of
>> *CodeGen/PeepholeOptimizer.cpp.*
>>
>> thanks.
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>>
>>
>
>
> --
> Ite Ursi
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150119/8aa4c217/attachment.html>

Chris Sears

2015-Jan-19 15:05 UTC

head link

[LLVMdev] X86TargetLowering::LowerToBT

Which BTQ? There are three flavors.

BTQ reg/reg
BTQ reg/mem
BTQ reg/imm

I can imagine that the reg/reg and especially the reg/mem versions would be
slow. However the shrq/and versions *with the same operands* would be slow
as well. There's even a compiler comment about the reg/mem version saying
"this is for disassembly only".

But I doubt BTQ reg/imm would be microcoded.


-- 
Ite Ursi
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150119/658be9ec/attachment.html>

llvm dev - Jan 2015 - [LLVMdev] X86TargetLowering::LowerToBT

[LLVMdev] X86TargetLowering::LowerToBT

[LLVMdev] X86TargetLowering::LowerToBT

[LLVMdev] X86TargetLowering::LowerToBT