similar to: [LLVMdev] X86TarIgetLowering::LowerToBT

Displaying 20 results from an estimated 600 matches similar to: "[LLVMdev] X86TarIgetLowering::LowerToBT"

2015 Jan 23
2
[LLVMdev] X86TarIgetLowering::LowerToBT
> icc generates testq for 0-30 and btq for 31-63. > That seems like a small bug in the bit 31 case. You can’t use testq for bit 31, because the immediate gets sign-extended. You *can* use the 32b form, of course.
2015 Jan 19
2
[LLVMdev] X86TargetLowering::LowerToBT
Sure. Attached is the file but here are the functions. The first uses a fixed bit offset. The second has a indexed bit offset. Compiling with llc -O3, LLVM version 3.7.0svn, it compiles the IR from IsBitSetB() using btq %rsi, %rdi. Good. But then it compiles IsBitSetA() with shrq/andq, which is is pretty much what Clang had generated as IR. shrq $25, %rdi andq $1, %rdi LLVM should be able to
2015 Jan 19
6
[LLVMdev] X86TargetLowering::LowerToBT
I'm tracking down an X86 code generation malfeasance regarding BT (bit test) and I have some questions. This IR *matches* and then *X86TargetLowering::LowerToBT **is called:* %and = and i64 %shl, %val * ; (val & (1 << index)) != 0 ; *bit test with a *register* index This IR *does not match* and so *X86TargetLowering::LowerToBT **is not called:* %and = lshr i64 %val, 25
2015 Jan 19
2
[LLVMdev] X86TargetLowering::LowerToBT
Which BTQ? There are three flavors. BTQ reg/reg BTQ reg/mem BTQ reg/imm I can imagine that the reg/reg and especially the reg/mem versions would be slow. However the shrq/and versions *with the same operands* would be slow as well. There's even a compiler comment about the reg/mem version saying "this is for disassembly only". But I doubt BTQ reg/imm would be microcoded. -- Ite
2015 Jan 22
2
[LLVMdev] X86TargetLowering::LowerToBT
On Thu Jan 22 2015 at 3:32:53 PM Chris Sears <chris.sears at gmail.com> wrote: > The status quo is: > > a) 40b REX+BT instruction for the 64b case > b) 48b TEST for the 32b case > c) unless it's small TEST > > > You are currently paying a 16b penalty for TEST vs BT in the 32b case. > That may be worth testing the -Os flag. > You'll want -Oz here, Os
2015 Jan 23
2
[LLVMdev] X86TargetLowering::LowerToBT
I suspect that this is because the mask in your example is the result of a variable shift, which (a) has it’s own performance and flags hazards pre-SHLX and (b) requires additional µops to do with TEST. I expect that ICC is putting a dummy TEST or XOR ahead of the BT to break the false flags dependency, as well. If the mask were constant, I expect ICC would generate TEST instead (but I don’t
2015 Jan 24
2
[LLVMdev] X86TargetLowering::LowerToBT
This is a patch to X86TargetLowering::LowerToBT() which was hashed over on the Developers list with Intel concurring. It checks whether the -Oz (optimize for size) flag is set or whether the containing function's PGO cold attribute is set. If either are true it emits BT for tests of bits 8-31 instead of TEST. Previously, TEST was always used for bits 0-31 and BT was always used for bits
2015 Jan 22
2
[LLVMdev] X86TargetLowering::LowerToBT
That’s not how partial-flags update stalls work. There is no independent tracking of individual bits in EFLAGS. This means that BT + CMOVNZ has a false dependency on whatever instruction wrote to EFLAGS before BT and requires an extra µop vis-a-vis TEST + CMOVNZ or SHR + AND. Please do not use BT. It is a performance hazard. If you don’t believe me for some reason, here’s the relevant quote
2015 Jan 22
2
[LLVMdev] X86TargetLowering::LowerToBT
> On Jan 22, 2015, at 1:22 PM, Fiona Glaser <fglaser at apple.com> wrote: > > According to Agner’s docs, many CPUs have slower BT than TEST; Haswell has only 0.5 inverse throughput as opposed to 0.25, Atom has 1 instead of 0.5, and Silvermont can’t even dual-issue BT (it locks both ALUs). So while BT does seem have a shorter instruction encoding than TEST for TEST reg, imm32 where
2015 Jan 22
3
[LLVMdev] X86TargetLowering::LowerToBT
Is that even a valid instruction? I thought TEST only took 32-bit immediates. Fiona > On Jan 22, 2015, at 2:48 PM, Chris Sears <chris.sears at gmail.com> wrote: > > The problem is that REX TEST reg,#(1<<37) is 10 bytes vs 5 bytes for REX BT reg,37. > That's a large space penalty to pay for a possible partial update stall. > > So the idea of generating BT for
2015 Jan 22
3
[LLVMdev] X86TargetLowering::LowerToBT
Yeah, the alternative is to do movabs and then test, which is doable but I’m not sure if it’s worth it (surely BT + risk of flags merging penalty has to be better than two ops, one of which is ~9-10 bytes). Fiona > On Jan 22, 2015, at 2:59 PM, Chris Sears <chris.sears at gmail.com> wrote: > > My bad on that. So that's what the comment meant. > That means BT is pretty much
2015 Jan 24
2
[LLVMdev] X86TargetLowering::LowerToBT
tst64.ll is attached from clang -S -emit-llvm tst64.c On Sat, Jan 24, 2015 at 11:56 AM, Mehdi Amini <mehdi.amini at apple.com> wrote: > Can you transform you C file into a LLVM IR lit test? > > See example in test/CodeGen/X86/*.ll > > Thanks, > > Mehdi > > > On Jan 24, 2015, at 11:47 AM, Chris Sears <chris.sears at gmail.com> wrote: > > > >
2013 Jul 14
2
[LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix
Hi, The issue perhaps wasn't explained ideally (and possibly shouldn't have been CCed directly to you either, so apologies, but now that there *is* a discussion...) > Try some actual relevant test instead: > > bt %eax,mem > bt %rax,mem > > and notice how they are actually fundamentally different. Test-case: I'm coming at this from the compiler side, where the
2011 Nov 03
4
How to used MKL (not revolution-mkl) with Debian packages
Hi folks, if you want to use MKL (the fast BLAS I have tested on my Thinkpad T410) with the R 2.14.0 built as Debian/Ubuntu packages available on CRAN mirror, the following tricks may works for without some known side-effects (likes openmp breaking issues), you may try to build your own libblas.so.3gf.0 with following command: $ gfortran -L/opt/intel/lib/intel64 -liomp5
2015 Sep 04
2
Build R with MKL and ICC
On Wed, 2015-09-02 at 20:49 +0200, arnaud gaboury wrote: > On Wed, Sep 2, 2015 at 7:35 PM, arnaud gaboury <arnaud.gaboury at gmail.com> wrote: > > After a few days of reading and headache, I finally gave a try at > > building R from source with Intel MKL and ICC. Documentation and posts > > on this topic are rather incomplete, sometime fantasist et do not give > >
2017 Apr 20
2
Intel MKL compiling issue
Dear R-developers, I would appreciate any insights over compiling R 3.4 with Intel MKL -- I have been successful until R 3.3.3 but now it stops complaining about pcre though it worked without Intel MKL as follows, ./configure LDFLAGS=-L/genetics/data/software/lib CFLAGS=-fPIC -I/genetics/data/software/include --enable-R-shlib I have used, export MKL_NUM_THREADS=15 export
2007 Feb 08
1
Queue extension issues
I'm stuck on queues! The way I read what documentation I have found, if I set up a queue like this: [general] persistentmembers = yes [testq] musiconhold=default strategy = ringall timeout = 10 retry = 5 context = testing member => SIP/100 and then add into extensions something like this: [incomingiax] exten => 1234,1,Dial(SIP/100,10) exten => 1234,2,Queue(testq|tTH|||300)
2015 Sep 02
4
Build R with MKL and ICC
After a few days of reading and headache, I finally gave a try at building R from source with Intel MKL and ICC. Documentation and posts on this topic are rather incomplete, sometime fantasist et do not give much explanations about configure options. As I am not sure if mine is correct, I would appreciate some advices and hints. OS: Fedora 22 parallel_studio_xe_2016 Hardware : 8 Thread(s) per
2013 Jul 14
0
[LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix
> And that is why I think you should just consider "bt $x,y" to be > trivially the same thing and not at all ambiguous. Because there is > ABSOLUTELY ZERO ambiguity when people write > > bt $63, mem > > Zero. Nada. None. The semantics are *exactly* the same for btl and btq > in this case, so why would you want the user to specify one or the > other? I
2015 Feb 03
2
[LLVMdev] RFC: Constant Hoisting
I've had a bug/pessimization which I've tracked down for 1 bit bitmasks: if (((xx) & (1ULL << (40)))) return 1; if (!((yy) & (1ULL << (40)))) ... The second time Constant Hoisting sees the value (1<<40) it wraps it up with a bitcast. That value then gets hoisted. However, the first (1<<40) is not bitcast and gets recognized as a BT. The second