Displaying 20 results from an estimated 600 matches similar to: "[LLVMdev] X86TarIgetLowering::LowerToBT"
2015 Jan 23
2
[LLVMdev] X86TarIgetLowering::LowerToBT
> icc generates testq for 0-30 and btq for 31-63.
> That seems like a small bug in the bit 31 case.
You can’t use testq for bit 31, because the immediate gets sign-extended. You *can* use the 32b form, of course.
2015 Jan 19
2
[LLVMdev] X86TargetLowering::LowerToBT
Sure. Attached is the file but here are the functions. The first uses a
fixed bit offset. The second has a indexed bit offset. Compiling with llc
-O3, LLVM version 3.7.0svn, it compiles the IR from IsBitSetB() using btq %rsi,
%rdi. Good. But then it compiles IsBitSetA() with shrq/andq, which is is
pretty much what Clang had generated as IR.
shrq $25, %rdi
andq $1, %rdi
LLVM should be able to
2015 Jan 19
6
[LLVMdev] X86TargetLowering::LowerToBT
I'm tracking down an X86 code generation malfeasance regarding BT (bit
test) and I have some questions.
This IR *matches* and then *X86TargetLowering::LowerToBT **is called:*
%and = and i64 %shl, %val * ; (val & (1 << index)) != 0 ; *bit test
with a *register* index
This IR *does not match* and so *X86TargetLowering::LowerToBT **is not
called:*
%and = lshr i64 %val, 25
2015 Jan 19
2
[LLVMdev] X86TargetLowering::LowerToBT
Which BTQ? There are three flavors.
BTQ reg/reg
BTQ reg/mem
BTQ reg/imm
I can imagine that the reg/reg and especially the reg/mem versions would be
slow. However the shrq/and versions *with the same operands* would be slow
as well. There's even a compiler comment about the reg/mem version saying
"this is for disassembly only".
But I doubt BTQ reg/imm would be microcoded.
--
Ite
2015 Jan 22
2
[LLVMdev] X86TargetLowering::LowerToBT
On Thu Jan 22 2015 at 3:32:53 PM Chris Sears <chris.sears at gmail.com> wrote:
> The status quo is:
>
> a) 40b REX+BT instruction for the 64b case
> b) 48b TEST for the 32b case
> c) unless it's small TEST
>
>
> You are currently paying a 16b penalty for TEST vs BT in the 32b case.
> That may be worth testing the -Os flag.
>
You'll want -Oz here, Os
2015 Jan 23
2
[LLVMdev] X86TargetLowering::LowerToBT
I suspect that this is because the mask in your example is the result of a variable shift, which (a) has it’s own performance and flags hazards pre-SHLX and (b) requires additional µops to do with TEST. I expect that ICC is putting a dummy TEST or XOR ahead of the BT to break the false flags dependency, as well.
If the mask were constant, I expect ICC would generate TEST instead (but I don’t
2015 Jan 24
2
[LLVMdev] X86TargetLowering::LowerToBT
This is a patch to X86TargetLowering::LowerToBT() which was hashed over on
the Developers list with Intel concurring.
It checks whether the -Oz (optimize for size) flag is set or whether the
containing function's PGO cold attribute is set. If either are true it
emits BT for tests of bits 8-31 instead of TEST. Previously, TEST was
always used for bits 0-31 and BT was always used for bits
2015 Jan 22
2
[LLVMdev] X86TargetLowering::LowerToBT
That’s not how partial-flags update stalls work. There is no independent tracking of individual bits in EFLAGS. This means that BT + CMOVNZ has a false dependency on whatever instruction wrote to EFLAGS before BT and requires an extra µop vis-a-vis TEST + CMOVNZ or SHR + AND.
Please do not use BT. It is a performance hazard. If you don’t believe me for some reason, here’s the relevant quote
2015 Jan 22
2
[LLVMdev] X86TargetLowering::LowerToBT
> On Jan 22, 2015, at 1:22 PM, Fiona Glaser <fglaser at apple.com> wrote:
>
> According to Agner’s docs, many CPUs have slower BT than TEST; Haswell has only 0.5 inverse throughput as opposed to 0.25, Atom has 1 instead of 0.5, and Silvermont can’t even dual-issue BT (it locks both ALUs). So while BT does seem have a shorter instruction encoding than TEST for TEST reg, imm32 where
2015 Jan 22
3
[LLVMdev] X86TargetLowering::LowerToBT
Is that even a valid instruction? I thought TEST only took 32-bit immediates.
Fiona
> On Jan 22, 2015, at 2:48 PM, Chris Sears <chris.sears at gmail.com> wrote:
>
> The problem is that REX TEST reg,#(1<<37) is 10 bytes vs 5 bytes for REX BT reg,37.
> That's a large space penalty to pay for a possible partial update stall.
>
> So the idea of generating BT for
2015 Jan 22
3
[LLVMdev] X86TargetLowering::LowerToBT
Yeah, the alternative is to do movabs and then test, which is doable but I’m not sure if it’s worth it (surely BT + risk of flags merging penalty has to be better than two ops, one of which is ~9-10 bytes).
Fiona
> On Jan 22, 2015, at 2:59 PM, Chris Sears <chris.sears at gmail.com> wrote:
>
> My bad on that. So that's what the comment meant.
> That means BT is pretty much
2015 Jan 24
2
[LLVMdev] X86TargetLowering::LowerToBT
tst64.ll is attached from clang -S -emit-llvm tst64.c
On Sat, Jan 24, 2015 at 11:56 AM, Mehdi Amini <mehdi.amini at apple.com> wrote:
> Can you transform you C file into a LLVM IR lit test?
>
> See example in test/CodeGen/X86/*.ll
>
> Thanks,
>
> Mehdi
>
> > On Jan 24, 2015, at 11:47 AM, Chris Sears <chris.sears at gmail.com> wrote:
> >
> >
2013 Jul 14
2
[LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix
Hi,
The issue perhaps wasn't explained ideally (and possibly shouldn't
have been CCed directly to you either, so apologies, but now that
there *is* a discussion...)
> Try some actual relevant test instead:
>
> bt %eax,mem
> bt %rax,mem
>
> and notice how they are actually fundamentally different. Test-case:
I'm coming at this from the compiler side, where the
2011 Nov 03
4
How to used MKL (not revolution-mkl) with Debian packages
Hi folks,
if you want to use MKL (the fast BLAS I have tested on my Thinkpad T410)
with the R 2.14.0 built as Debian/Ubuntu packages available on CRAN mirror,
the following tricks may works for without some known side-effects (likes
openmp breaking issues), you may try to build your own libblas.so.3gf.0
with following command:
$ gfortran -L/opt/intel/lib/intel64 -liomp5
2015 Sep 04
2
Build R with MKL and ICC
On Wed, 2015-09-02 at 20:49 +0200, arnaud gaboury wrote:
> On Wed, Sep 2, 2015 at 7:35 PM, arnaud gaboury <arnaud.gaboury at gmail.com> wrote:
> > After a few days of reading and headache, I finally gave a try at
> > building R from source with Intel MKL and ICC. Documentation and posts
> > on this topic are rather incomplete, sometime fantasist et do not give
> >
2017 Apr 20
2
Intel MKL compiling issue
Dear R-developers,
I would appreciate any insights over compiling R 3.4 with Intel MKL -- I have been successful until R 3.3.3 but now it stops complaining about pcre though it worked without Intel MKL as follows,
./configure LDFLAGS=-L/genetics/data/software/lib CFLAGS=-fPIC -I/genetics/data/software/include --enable-R-shlib
I have used,
export MKL_NUM_THREADS=15
export
2007 Feb 08
1
Queue extension issues
I'm stuck on queues!
The way I read what documentation I have found, if I set up a queue like
this:
[general]
persistentmembers = yes
[testq]
musiconhold=default
strategy = ringall
timeout = 10
retry = 5
context = testing
member => SIP/100
and then add into extensions something like this:
[incomingiax]
exten => 1234,1,Dial(SIP/100,10)
exten => 1234,2,Queue(testq|tTH|||300)
2015 Sep 02
4
Build R with MKL and ICC
After a few days of reading and headache, I finally gave a try at
building R from source with Intel MKL and ICC. Documentation and posts
on this topic are rather incomplete, sometime fantasist et do not give
much explanations about configure options.
As I am not sure if mine is correct, I would appreciate some advices and hints.
OS: Fedora 22
parallel_studio_xe_2016
Hardware : 8 Thread(s) per
2013 Jul 14
0
[LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix
> And that is why I think you should just consider "bt $x,y" to be
> trivially the same thing and not at all ambiguous. Because there is
> ABSOLUTELY ZERO ambiguity when people write
>
> bt $63, mem
>
> Zero. Nada. None. The semantics are *exactly* the same for btl and btq
> in this case, so why would you want the user to specify one or the
> other?
I
2015 Feb 03
2
[LLVMdev] RFC: Constant Hoisting
I've had a bug/pessimization which I've tracked down for 1 bit bitmasks:
if (((xx) & (1ULL << (40))))
return 1;
if (!((yy) & (1ULL << (40))))
...
The second time Constant Hoisting sees the value (1<<40) it wraps it up
with a bitcast.
That value then gets hoisted. However, the first (1<<40) is not bitcast and
gets recognized
as a BT. The second