search for: lowertobt

Displaying 14 results from an estimated 14 matches for "lowertobt".

2015 Jan 19
6
[LLVMdev] X86TargetLowering::LowerToBT
I'm tracking down an X86 code generation malfeasance regarding BT (bit test) and I have some questions. This IR *matches* and then *X86TargetLowering::LowerToBT **is called:* %and = and i64 %shl, %val * ; (val & (1 << index)) != 0 ; *bit test with a *register* index This IR *does not match* and so *X86TargetLowering::LowerToBT **is not called:* %and = lshr i64 %val, 25 * ; (val & (1 **<< 25)) != 0 ; *bit tes...
2015 Jan 19
2
[LLVMdev] X86TargetLowering::LowerToBT
...> Thanks, > > Mehdi > > On Jan 18, 2015, at 5:13 PM, Chris Sears <chris.sears at gmail.com> wrote: > > I'm tracking down an X86 code generation malfeasance regarding BT (bit > test) and I have some questions. > > This IR *matches* and then *X86TargetLowering::LowerToBT **is called:* > > %and = and i64 %shl, %val * ; (val & (1 << index)) != 0 ; *bit > test with a *register* index > > > This IR *does not match* and so *X86TargetLowering::LowerToBT **is not > called:* > > %and = lshr i64 %val, 25 * ; (val & (1...
2015 Jan 22
2
[LLVMdev] X86TargetLowering::LowerToBT
...15, at 1:17 PM, Mehdi Amini <mehdi.amini at apple.com <mailto:mehdi.amini at apple.com>> wrote: >> >> >> >>> Begin forwarded message: >>> >>> Date: January 18, 2015 at 10:57:33 PM PST >>> Subject: Re: [LLVMdev] X86TargetLowering::LowerToBT >>> From: Chris Sears <chris.sears at gmail.com <mailto:chris.sears at gmail.com>> >>> To: Mehdi Amini <mehdi.amini at apple.com <mailto:mehdi.amini at apple.com>> >>> Cc: LLVM Developers Mailing List <llvmdev at cs.uiuc.edu <mailto:llvmdev a...
2015 Jan 24
2
[LLVMdev] X86TargetLowering::LowerToBT
This is a patch to X86TargetLowering::LowerToBT() which was hashed over on the Developers list with Intel concurring. It checks whether the -Oz (optimize for size) flag is set or whether the containing function's PGO cold attribute is set. If either are true it emits BT for tests of bits 8-31 instead of TEST. Previously, TEST was always use...
2015 Jan 24
2
[LLVMdev] X86TargetLowering::LowerToBT
...te: > Can you transform you C file into a LLVM IR lit test? > > See example in test/CodeGen/X86/*.ll > > Thanks, > > Mehdi > > > On Jan 24, 2015, at 11:47 AM, Chris Sears <chris.sears at gmail.com> wrote: > > > > This is a patch to X86TargetLowering::LowerToBT() which was hashed over > on the Developers list with Intel concurring. > > > > It checks whether the -Oz (optimize for size) flag is set or whether the > containing function's PGO cold attribute is set. If either are true it > emits BT for tests of bits 8-31 instead of TES...
2015 Jan 22
2
[LLVMdev] X86TargetLowering::LowerToBT
That’s not how partial-flags update stalls work. There is no independent tracking of individual bits in EFLAGS. This means that BT + CMOVNZ has a false dependency on whatever instruction wrote to EFLAGS before BT and requires an extra µop vis-a-vis TEST + CMOVNZ or SHR + AND. Please do not use BT. It is a performance hazard. If you don’t believe me for some reason, here’s the relevant quote
2015 Jan 23
3
[LLVMdev] X86TarIgetLowering::LowerToBT
...on our rules for BT usage later this afternoon. Kevin B. Smith From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Chris Sears Sent: Friday, January 23, 2015 9:37 AM To: Stephen Canon Cc: LLVM Developers Mailing List Subject: Re: [LLVMdev] X86TargetLowering::LowerToBT Constant mask case. Sanjay, could you run this through the Intel compiler with the appropriate flags? They have an -O2 but I couldn't find an equivalent -Oz. For LLVM, it generates BTQ for testing bits 32-63. -------------- next part -------------- An HTML attachment was scrubbed... URL: &lt...
2015 Jan 22
3
[LLVMdev] X86TargetLowering::LowerToBT
...m> wrote: > > The problem is that REX TEST reg,#(1<<37) is 10 bytes vs 5 bytes for REX BT reg,37. > That's a large space penalty to pay for a possible partial update stall. > > So the idea of generating BT for -Os and TEST for -Ofast makes sense to me. > Regardless, LowerToBT is generating TEST for small values. > > I'm still tracking down the Clang issue and I'd like folks to look at that. > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs...
2015 Jan 22
3
[LLVMdev] X86TargetLowering::LowerToBT
Yeah, the alternative is to do movabs and then test, which is doable but I’m not sure if it’s worth it (surely BT + risk of flags merging penalty has to be better than two ops, one of which is ~9-10 bytes). Fiona > On Jan 22, 2015, at 2:59 PM, Chris Sears <chris.sears at gmail.com> wrote: > > My bad on that. So that's what the comment meant. > That means BT is pretty much
2015 Jan 23
2
[LLVMdev] X86TarIgetLowering::LowerToBT
> icc generates testq for 0-30 and btq for 31-63. > That seems like a small bug in the bit 31 case. You can’t use testq for bit 31, because the immediate gets sign-extended. You *can* use the 32b form, of course.
2015 Jan 19
2
[LLVMdev] X86TargetLowering::LowerToBT
Which BTQ? There are three flavors. BTQ reg/reg BTQ reg/mem BTQ reg/imm I can imagine that the reg/reg and especially the reg/mem versions would be slow. However the shrq/and versions *with the same operands* would be slow as well. There's even a compiler comment about the reg/mem version saying "this is for disassembly only". But I doubt BTQ reg/imm would be microcoded. -- Ite
2015 Jan 22
2
[LLVMdev] X86TargetLowering::LowerToBT
On Thu Jan 22 2015 at 3:32:53 PM Chris Sears <chris.sears at gmail.com> wrote: > The status quo is: > > a) 40b REX+BT instruction for the 64b case > b) 48b TEST for the 32b case > c) unless it's small TEST > > > You are currently paying a 16b penalty for TEST vs BT in the 32b case. > That may be worth testing the -Os flag. > You'll want -Oz here, Os
2015 Jan 23
2
[LLVMdev] X86TargetLowering::LowerToBT
I suspect that this is because the mask in your example is the result of a variable shift, which (a) has it’s own performance and flags hazards pre-SHLX and (b) requires additional µops to do with TEST. I expect that ICC is putting a dummy TEST or XOR ahead of the BT to break the false flags dependency, as well. If the mask were constant, I expect ICC would generate TEST instead (but I don’t
2015 Feb 03
2
[LLVMdev] RFC: Constant Hoisting
...t get recognised because of the hoisting. The result is some register allocation and unnecessary constant loading instructions. There are maybe three 'solutions' to this problem, maybe more. Starting with the second, in the middle of things, you could try pattern matching in EmitTest() or LowerToBT(). I've tried this and it doesn't work since it needs to reach outside of a Selection DAG. Doesn't work. Can't work. Thirdly, it's been suggested to use a peephole pass and to look at AArch64LoadStoreOptimizer.cpp. This also doesn't work for pretty much the same reason. Mo...