Displaying 20 results from an estimated 1000 matches similar to: "[LLVMdev] X86TargetLowering::LowerToBT"
2015 Jan 19
2
[LLVMdev] X86TargetLowering::LowerToBT
Sure. Attached is the file but here are the functions. The first uses a
fixed bit offset. The second has a indexed bit offset. Compiling with llc
-O3, LLVM version 3.7.0svn, it compiles the IR from IsBitSetB() using btq %rsi,
%rdi. Good. But then it compiles IsBitSetA() with shrq/andq, which is is
pretty much what Clang had generated as IR.
shrq $25, %rdi
andq $1, %rdi
LLVM should be able to
2015 Jan 22
2
[LLVMdev] X86TargetLowering::LowerToBT
> On Jan 22, 2015, at 1:22 PM, Fiona Glaser <fglaser at apple.com> wrote:
>
> According to Agner’s docs, many CPUs have slower BT than TEST; Haswell has only 0.5 inverse throughput as opposed to 0.25, Atom has 1 instead of 0.5, and Silvermont can’t even dual-issue BT (it locks both ALUs). So while BT does seem have a shorter instruction encoding than TEST for TEST reg, imm32 where
2015 Jan 23
3
[LLVMdev] X86TarIgetLowering::LowerToBT
I’ll be happy to run it for you. Do you want Intel64, x86 or both? The Intel compiler doesn’t have a –Oz option. It has –Os and –O[123].
Also, FWIW, one of the Intel compiler experts on BT will comment on this thread, and on our rules for BT usage later this afternoon.
Kevin B. Smith
From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Chris Sears
Sent:
2015 Jan 24
2
[LLVMdev] X86TargetLowering::LowerToBT
This is a patch to X86TargetLowering::LowerToBT() which was hashed over on
the Developers list with Intel concurring.
It checks whether the -Oz (optimize for size) flag is set or whether the
containing function's PGO cold attribute is set. If either are true it
emits BT for tests of bits 8-31 instead of TEST. Previously, TEST was
always used for bits 0-31 and BT was always used for bits
2015 Jan 19
2
[LLVMdev] X86TargetLowering::LowerToBT
Which BTQ? There are three flavors.
BTQ reg/reg
BTQ reg/mem
BTQ reg/imm
I can imagine that the reg/reg and especially the reg/mem versions would be
slow. However the shrq/and versions *with the same operands* would be slow
as well. There's even a compiler comment about the reg/mem version saying
"this is for disassembly only".
But I doubt BTQ reg/imm would be microcoded.
--
Ite
2015 Jan 22
2
[LLVMdev] X86TargetLowering::LowerToBT
On Thu Jan 22 2015 at 3:32:53 PM Chris Sears <chris.sears at gmail.com> wrote:
> The status quo is:
>
> a) 40b REX+BT instruction for the 64b case
> b) 48b TEST for the 32b case
> c) unless it's small TEST
>
>
> You are currently paying a 16b penalty for TEST vs BT in the 32b case.
> That may be worth testing the -Os flag.
>
You'll want -Oz here, Os
2015 Jan 23
2
[LLVMdev] X86TargetLowering::LowerToBT
I suspect that this is because the mask in your example is the result of a variable shift, which (a) has it’s own performance and flags hazards pre-SHLX and (b) requires additional µops to do with TEST. I expect that ICC is putting a dummy TEST or XOR ahead of the BT to break the false flags dependency, as well.
If the mask were constant, I expect ICC would generate TEST instead (but I don’t
2015 Jan 22
2
[LLVMdev] X86TargetLowering::LowerToBT
That’s not how partial-flags update stalls work. There is no independent tracking of individual bits in EFLAGS. This means that BT + CMOVNZ has a false dependency on whatever instruction wrote to EFLAGS before BT and requires an extra µop vis-a-vis TEST + CMOVNZ or SHR + AND.
Please do not use BT. It is a performance hazard. If you don’t believe me for some reason, here’s the relevant quote
2015 Jan 22
3
[LLVMdev] X86TargetLowering::LowerToBT
Is that even a valid instruction? I thought TEST only took 32-bit immediates.
Fiona
> On Jan 22, 2015, at 2:48 PM, Chris Sears <chris.sears at gmail.com> wrote:
>
> The problem is that REX TEST reg,#(1<<37) is 10 bytes vs 5 bytes for REX BT reg,37.
> That's a large space penalty to pay for a possible partial update stall.
>
> So the idea of generating BT for
2015 Jan 22
3
[LLVMdev] X86TargetLowering::LowerToBT
Yeah, the alternative is to do movabs and then test, which is doable but I’m not sure if it’s worth it (surely BT + risk of flags merging penalty has to be better than two ops, one of which is ~9-10 bytes).
Fiona
> On Jan 22, 2015, at 2:59 PM, Chris Sears <chris.sears at gmail.com> wrote:
>
> My bad on that. So that's what the comment meant.
> That means BT is pretty much
2015 Jan 24
2
[LLVMdev] X86TargetLowering::LowerToBT
tst64.ll is attached from clang -S -emit-llvm tst64.c
On Sat, Jan 24, 2015 at 11:56 AM, Mehdi Amini <mehdi.amini at apple.com> wrote:
> Can you transform you C file into a LLVM IR lit test?
>
> See example in test/CodeGen/X86/*.ll
>
> Thanks,
>
> Mehdi
>
> > On Jan 24, 2015, at 11:47 AM, Chris Sears <chris.sears at gmail.com> wrote:
> >
> >
2015 Jan 23
2
[LLVMdev] X86TarIgetLowering::LowerToBT
> icc generates testq for 0-30 and btq for 31-63.
> That seems like a small bug in the bit 31 case.
You can’t use testq for bit 31, because the immediate gets sign-extended. You *can* use the 32b form, of course.
2004 Feb 20
1
[LLVMdev] Changes in MachineInstruction/Peephole Optimizer?
Hi all,
The register allocator that I implemented is failing in the LLVM cvs
version, but not in LLVM 1.1. The generated code fails a check in the
x86 peephole optimizer:
llc: PeepholeOptimizer.cpp:128: bool
<unnamed>::PH::PeepholeOptimize(llvm::Machi
neBasicBlock&, llvm::ilist_iterator<llvm::MachineInstr>&): Assertion
`MI->getNum
Operands() == 2 && "These
2016 Mar 10
2
[CodeGen] PeepholeOptimizer: optimizing condition dependent instrunctions
Hi Quentin,
Yes, the code allows to process connected instructions. Although it should be taken into account that the instruction next to the current processed instruction must never be erased because this invalidates iterator.
I've been fixing a bug in AArch64InstrInfo::optimizeCompareInstr: instructions are converted into S form but it's not checked that they produce the same flags as
2016 Mar 09
2
[CodeGen] PeepholeOptimizer: optimizing condition dependent instrunctions
Hi,
I find it's quite strange how condition dependent instructions are processed in PeepholeOptimizer::runOnMachineFunction:
01577 if ((isUncoalescableCopy(*MI) &&
01578 optimizeUncoalescableCopy(MI, LocalMIs)) ||
01579 (MI->isCompare() && optimizeCmpInstr(MI, &MBB)) ||
01580 (MI->isSelect() && optimizeSelect(MI,
2015 Feb 03
2
[LLVMdev] RFC: Constant Hoisting
I've had a bug/pessimization which I've tracked down for 1 bit bitmasks:
if (((xx) & (1ULL << (40))))
return 1;
if (!((yy) & (1ULL << (40))))
...
The second time Constant Hoisting sees the value (1<<40) it wraps it up
with a bitcast.
That value then gets hoisted. However, the first (1<<40) is not bitcast and
gets recognized
as a BT. The second
2011 Feb 18
0
[LLVMdev] Adding "S" suffixed ARM/Thumb2 instructions
On Feb 17, 2011, at 10:35 PM, Вадим Марковцев wrote:
> Hello everyone,
>
> I've added the "S" suffixed versions of ARM and Thumb2 instructions to tablegen. Those are, for example, "movs" or "muls".
> Of course, some instructions have already had their twins, such as add/adds, and I leaved them untouched.
Adding separate "s" instructions is
2015 Feb 11
2
[LLVMdev] deleting or replacing a MachineInst
I'm writing a peephole pass and I'm done with the X86_64 instruction level
detail work. But I'm having difficulty with the basic block surgery
of replacing the old MachineInst.
The peephole pass gets called per MachineFunction and then iterates over
each MachineBasicBlock and in turn over each MachineInst. When it finds an
instruction which should be replaced, it builds a new
2011 Feb 18
2
[LLVMdev] Adding "S" suffixed ARM/Thumb2 instructions
Hello everyone,
I've added the "S" suffixed versions of ARM and Thumb2 instructions to
tablegen. Those are, for example, "movs" or "muls".
Of course, some instructions have already had their twins, such as add/adds,
and I leaved them untouched.
Besides, I propose the codegen optimization based on them, which removes the
redundant comparison in patterns like
orr
2015 Feb 11
2
[LLVMdev] deleting or replacing a MachineInst
This seems a very natural approach but I probably am having a trouble with
the iterator invalidation. However, looking at other peephole optimizers
passes, I couldn't see how to do this:
#define BUILD_INS(opcode, new_reg, i) \
BuildMI(*MBB, MBBI, MBBI->getDebugLoc(), TII->get(X86::opcode)) \
.addReg(X86::new_reg, kill).addImm(i)
for