thr3ads.net - similar to: "[LLVMdev] X86TargetLowering::LowerToBT"

Displaying 20 results from an estimated 1000 matches similar to: "[LLVMdev] X86TargetLowering::LowerToBT"

2015 Jan 19

[LLVMdev] X86TargetLowering::LowerToBT

Sure. Attached is the file but here are the functions. The first uses a fixed bit offset. The second has a indexed bit offset. Compiling with llc -O3, LLVM version 3.7.0svn, it compiles the IR from IsBitSetB() using btq %rsi, %rdi. Good. But then it compiles IsBitSetA() with shrq/andq, which is is pretty much what Clang had generated as IR. shrq $25, %rdi andq $1, %rdi LLVM should be able to

[LLVMdev] X86TargetLowering::LowerToBT

2015 Jan 22

[LLVMdev] X86TargetLowering::LowerToBT

> On Jan 22, 2015, at 1:22 PM, Fiona Glaser <fglaser at apple.com> wrote: > > According to Agner’s docs, many CPUs have slower BT than TEST; Haswell has only 0.5 inverse throughput as opposed to 0.25, Atom has 1 instead of 0.5, and Silvermont can’t even dual-issue BT (it locks both ALUs). So while BT does seem have a shorter instruction encoding than TEST for TEST reg, imm32 where

[LLVMdev] X86TarIgetLowering::LowerToBT

2015 Jan 23

[LLVMdev] X86TarIgetLowering::LowerToBT

I’ll be happy to run it for you. Do you want Intel64, x86 or both? The Intel compiler doesn’t have a –Oz option. It has –Os and –O[123]. Also, FWIW, one of the Intel compiler experts on BT will comment on this thread, and on our rules for BT usage later this afternoon. Kevin B. Smith From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Chris Sears Sent:

[LLVMdev] X86TargetLowering::LowerToBT

2015 Jan 24

[LLVMdev] X86TargetLowering::LowerToBT

This is a patch to X86TargetLowering::LowerToBT() which was hashed over on the Developers list with Intel concurring. It checks whether the -Oz (optimize for size) flag is set or whether the containing function's PGO cold attribute is set. If either are true it emits BT for tests of bits 8-31 instead of TEST. Previously, TEST was always used for bits 0-31 and BT was always used for bits

[LLVMdev] X86TargetLowering::LowerToBT

2015 Jan 19

[LLVMdev] X86TargetLowering::LowerToBT

Which BTQ? There are three flavors. BTQ reg/reg BTQ reg/mem BTQ reg/imm I can imagine that the reg/reg and especially the reg/mem versions would be slow. However the shrq/and versions *with the same operands* would be slow as well. There's even a compiler comment about the reg/mem version saying "this is for disassembly only". But I doubt BTQ reg/imm would be microcoded. -- Ite

[LLVMdev] X86TargetLowering::LowerToBT

2015 Jan 22

[LLVMdev] X86TargetLowering::LowerToBT

On Thu Jan 22 2015 at 3:32:53 PM Chris Sears <chris.sears at gmail.com> wrote: > The status quo is: > > a) 40b REX+BT instruction for the 64b case > b) 48b TEST for the 32b case > c) unless it's small TEST > > > You are currently paying a 16b penalty for TEST vs BT in the 32b case. > That may be worth testing the -Os flag. > You'll want -Oz here, Os

[LLVMdev] X86TargetLowering::LowerToBT

2015 Jan 23

[LLVMdev] X86TargetLowering::LowerToBT

I suspect that this is because the mask in your example is the result of a variable shift, which (a) has it’s own performance and flags hazards pre-SHLX and (b) requires additional µops to do with TEST. I expect that ICC is putting a dummy TEST or XOR ahead of the BT to break the false flags dependency, as well. If the mask were constant, I expect ICC would generate TEST instead (but I don’t

[LLVMdev] X86TargetLowering::LowerToBT

2015 Jan 22

[LLVMdev] X86TargetLowering::LowerToBT

That’s not how partial-flags update stalls work. There is no independent tracking of individual bits in EFLAGS. This means that BT + CMOVNZ has a false dependency on whatever instruction wrote to EFLAGS before BT and requires an extra µop vis-a-vis TEST + CMOVNZ or SHR + AND. Please do not use BT. It is a performance hazard. If you don’t believe me for some reason, here’s the relevant quote

[LLVMdev] X86TargetLowering::LowerToBT

2015 Jan 22

[LLVMdev] X86TargetLowering::LowerToBT

Is that even a valid instruction? I thought TEST only took 32-bit immediates. Fiona > On Jan 22, 2015, at 2:48 PM, Chris Sears <chris.sears at gmail.com> wrote: > > The problem is that REX TEST reg,#(1<<37) is 10 bytes vs 5 bytes for REX BT reg,37. > That's a large space penalty to pay for a possible partial update stall. > > So the idea of generating BT for

[LLVMdev] X86TargetLowering::LowerToBT

2015 Jan 22

[LLVMdev] X86TargetLowering::LowerToBT

Yeah, the alternative is to do movabs and then test, which is doable but I’m not sure if it’s worth it (surely BT + risk of flags merging penalty has to be better than two ops, one of which is ~9-10 bytes). Fiona > On Jan 22, 2015, at 2:59 PM, Chris Sears <chris.sears at gmail.com> wrote: > > My bad on that. So that's what the comment meant. > That means BT is pretty much

[LLVMdev] X86TargetLowering::LowerToBT

2015 Jan 24

[LLVMdev] X86TargetLowering::LowerToBT

tst64.ll is attached from clang -S -emit-llvm tst64.c On Sat, Jan 24, 2015 at 11:56 AM, Mehdi Amini <mehdi.amini at apple.com> wrote: > Can you transform you C file into a LLVM IR lit test? > > See example in test/CodeGen/X86/*.ll > > Thanks, > > Mehdi > > > On Jan 24, 2015, at 11:47 AM, Chris Sears <chris.sears at gmail.com> wrote: > > > >

[LLVMdev] X86TarIgetLowering::LowerToBT

2015 Jan 23

[LLVMdev] X86TarIgetLowering::LowerToBT

> icc generates testq for 0-30 and btq for 31-63. > That seems like a small bug in the bit 31 case. You can’t use testq for bit 31, because the immediate gets sign-extended. You *can* use the 32b form, of course.

[LLVMdev] Changes in MachineInstruction/Peephole Optimizer?

2004 Feb 20

[LLVMdev] Changes in MachineInstruction/Peephole Optimizer?

Hi all, The register allocator that I implemented is failing in the LLVM cvs version, but not in LLVM 1.1. The generated code fails a check in the x86 peephole optimizer: llc: PeepholeOptimizer.cpp:128: bool <unnamed>::PH::PeepholeOptimize(llvm::Machi neBasicBlock&, llvm::ilist_iterator<llvm::MachineInstr>&): Assertion `MI->getNum Operands() == 2 && "These

[CodeGen] PeepholeOptimizer: optimizing condition dependent instrunctions

2016 Mar 10

[CodeGen] PeepholeOptimizer: optimizing condition dependent instrunctions

Hi Quentin, Yes, the code allows to process connected instructions. Although it should be taken into account that the instruction next to the current processed instruction must never be erased because this invalidates iterator. I've been fixing a bug in AArch64InstrInfo::optimizeCompareInstr: instructions are converted into S form but it's not checked that they produce the same flags as

[CodeGen] PeepholeOptimizer: optimizing condition dependent instrunctions

2016 Mar 09

[CodeGen] PeepholeOptimizer: optimizing condition dependent instrunctions

Hi, I find it's quite strange how condition dependent instructions are processed in PeepholeOptimizer::runOnMachineFunction: 01577 if ((isUncoalescableCopy(*MI) && 01578 optimizeUncoalescableCopy(MI, LocalMIs)) || 01579 (MI->isCompare() && optimizeCmpInstr(MI, &MBB)) || 01580 (MI->isSelect() && optimizeSelect(MI,

[LLVMdev] RFC: Constant Hoisting

2015 Feb 03

[LLVMdev] RFC: Constant Hoisting

I've had a bug/pessimization which I've tracked down for 1 bit bitmasks: if (((xx) & (1ULL << (40)))) return 1; if (!((yy) & (1ULL << (40)))) ... The second time Constant Hoisting sees the value (1<<40) it wraps it up with a bitcast. That value then gets hoisted. However, the first (1<<40) is not bitcast and gets recognized as a BT. The second

[LLVMdev] Adding "S" suffixed ARM/Thumb2 instructions

2011 Feb 18

[LLVMdev] Adding "S" suffixed ARM/Thumb2 instructions

On Feb 17, 2011, at 10:35 PM, Вадим Марковцев wrote: > Hello everyone, > > I've added the "S" suffixed versions of ARM and Thumb2 instructions to tablegen. Those are, for example, "movs" or "muls". > Of course, some instructions have already had their twins, such as add/adds, and I leaved them untouched. Adding separate "s" instructions is

[LLVMdev] deleting or replacing a MachineInst

2015 Feb 11

[LLVMdev] deleting or replacing a MachineInst

I'm writing a peephole pass and I'm done with the X86_64 instruction level detail work. But I'm having difficulty with the basic block surgery of replacing the old MachineInst. The peephole pass gets called per MachineFunction and then iterates over each MachineBasicBlock and in turn over each MachineInst. When it finds an instruction which should be replaced, it builds a new

[LLVMdev] Adding "S" suffixed ARM/Thumb2 instructions

2011 Feb 18

[LLVMdev] Adding "S" suffixed ARM/Thumb2 instructions

Hello everyone, I've added the "S" suffixed versions of ARM and Thumb2 instructions to tablegen. Those are, for example, "movs" or "muls". Of course, some instructions have already had their twins, such as add/adds, and I leaved them untouched. Besides, I propose the codegen optimization based on them, which removes the redundant comparison in patterns like orr

[LLVMdev] deleting or replacing a MachineInst

2015 Feb 11

[LLVMdev] deleting or replacing a MachineInst

This seems a very natural approach but I probably am having a trouble with the iterator invalidation. However, looking at other peephole optimizers passes, I couldn't see how to do this: #define BUILD_INS(opcode, new_reg, i) \ BuildMI(*MBB, MBBI, MBBI->getDebugLoc(), TII->get(X86::opcode)) \ .addReg(X86::new_reg, kill).addImm(i) for

similar to: [LLVMdev] X86TargetLowering::LowerToBT