Displaying 20 results from an estimated 1200 matches similar to: "[LLVMdev] Missuse of xmm register on X86-64"
2010 May 19
1
[LLVMdev] Scheduled Instructions go missing
All,
I'm working on a new scheduler. I have a basic block for
which my scheduler generates bad code. The C code looks
like
int j, *p;
if ((j = *p++) != 0) {...}
My scheduler emits (x86, AT&T)
mov p, %rax
mov (%rax), %rax
mov %rax, j
addq $0x04, p
je ...
Notice there is no test instruction. The default list
scheduler generates
mov p, %rax
mov (%rax), %rax
mov %rax, j
addq $0x04,
2010 Feb 04
1
[LLVMdev] Instruction Itineraries
All,
I am working on a scheduler for X86 and would like to
include instruction latencies. It appears that this
information is gathered from instruction itineraries, but
that there isn't an itinerary for X86. I also can't seem
to find documentation on how to add this for X86. Any
pointers would be helpfull.
Aran
-------------- next part --------------
A non-text attachment was
2010 Mar 26
2
[LLVMdev] X86 Target instruction definitions
All,
Where are the SSE instructions defined? Specifically, I
cannot find the def for ADDSDrr.
Aran
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 194 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20100326/130b2e02/attachment.sig>
2010 Feb 13
1
[LLVMdev] llvm-gcc 4.2
All,
I'm trying to build llvm-gcc 4.2 from svn (as of about a
week ago). I'm getting:
../../llvm-gcc-4.2/libcpp/expr.c: In function 'num_negate':
../../llvm-gcc-4.2/libcpp/expr.c:1114: internal compiler
error: Segmentation fault
Please submit a full bug report,
with preprocessed source if appropriate.
I would like to do some debugging, but I don't see
where
2013 Dec 05
3
[LLVMdev] X86 - Help on fixing a poor code generation bug
Hi all,
I noticed that the x86 backend tends to emit unnecessary vector insert
instructions immediately after sse scalar fp instructions like
addss/mulss.
For example:
/////////////////////////////////
__m128 foo(__m128 A, __m128 B) {
_mm_add_ss(A, B);
}
/////////////////////////////////
produces the sequence:
addss %xmm0, %xmm1
movss %xmm1, %xmm0
which could be easily optimized into
2010 Nov 20
0
[LLVMdev] Poor floating point optimizations?
And also the resulting assembly code is very poor:
00460013 movss xmm0,dword ptr [esp+8]
00460019 movaps xmm1,xmm0
0046001C addss xmm1,xmm1
00460020 pxor xmm2,xmm2
00460024 addss xmm2,xmm1
00460028 addss xmm2,xmm0
0046002C movss dword ptr [esp],xmm2
00460031 fld dword ptr [esp]
Especially pxor&and instead of movss (which is
2013 Dec 05
0
[LLVMdev] X86 - Help on fixing a poor code generation bug
Hi Andrea,
Thanks for working on this. I can see two approaches to solving this problem. The first one (that you suggested) is to catch this pattern after register allocation. The second approach is to eliminate this redundancy during instruction selection. Can you please look into catching this pattern during iSel? The idea is that ADDSS does an ADD plus BLEND operations, and you can easily
2008 Jan 24
2
[LLVMdev] llvm-gcc + abi stuff
<moving this to llvmdev instead of commits>
On Jan 22, 2008, at 11:23 PM, Duncan Sands wrote:
>> Okay, well we already get many other x86-64 issues wrong already, but
>> Evan is chipping away at it. How do you pass an array by value in C?
>> Example please,
>
> I find the x86-64 ABI hard to interpret, but it seems to say that
> aggregates are classified
2013 Apr 03
2
[LLVMdev] Packed instructions generaetd by LoopVectorize?
Hi,
I have a question about LoopVectorize. I wrote a simple test case, a dot product loop and found that packed instructions are generated when input arrays are integer, but not when they are float or double.
If I modify the float example in http://llvm.org/docs/Vectorizers.html by adding restrict to the input arrays packed instructions are generated. Although it should not be required I tried
2004 Aug 06
2
[PATCH] Make SSE Run Time option. Add Win32 SSE code
All,
Attached is a patch that does two things. First it makes the use
of the current SSE code a run time option through the use
of speex_decoder_ctl() and speex_encoder_ctl
It does this twofold. First there is a modification to the configure.in
script which introduces a check based upon platform. It will compile in the
sse assembly if you are on an i?86 based platform by making a
2001 Feb 06
2
SCO 5.0.5 (i686-pc-sco3.2v5.0.5), scp and the -n option
Ok, using openssh-SNAP-20010126.tar.gz, two versions of the server both compiled with the
configure commands as below, one with USE_PIPES defined
and one without. This is on SCO OpenServer 5.0.5 (using SCO dev
environment, SCO make, etc.) The client is always linux, openssh
2.3.0p1.
export CCFLAGS='-L/usr/local/lib -I/usr/local/include'
./configure --sysconfdir=/etc/ssh
2013 Apr 03
0
[LLVMdev] Packed instructions generaetd by LoopVectorize?
Hi Tyler,
Try adding -ffast-math. We can only vectorize reduction variables if it is safe to reorder floating point operations.
Thanks,
Nadav
On Apr 3, 2013, at 10:29 AM, "Nowicki, Tyler" <tyler.nowicki at intel.com> wrote:
> Hi,
>
> I have a question about LoopVectorize. I wrote a simple test case, a dot product loop and found that packed instructions are
2010 Nov 20
2
[LLVMdev] Poor floating point optimizations?
I wanted to use LLVM for my math parser but it seems that floating point
optimizations are poor.
For example consider such C code:
float foo(float x) { return x+x+x; }
and here is the code generated with "optimized" live demo:
define float @foo(float %x) nounwind readnone { entry: %0 = fmul float %x,
2.000000e+00 ; <float> [#uses=1] %1 = fadd float %0, %x
2017 Sep 29
2
Trouble when suppressing a portion of fast-math-transformations
Hi all,
In a mailing-list post last November:
http://lists.llvm.org/pipermail/llvm-dev/2016-November/107104.html
I raised some concerns that having the IR-level fast-math-flag 'fast' act as an
"umbrella" to implicitly turn on all the lower-level fast-math-flags, causes
some fundamental problems. Those fundamental problems are related to
situations where a user wants to
2013 Apr 04
1
[LLVMdev] Packed instructions generaetd by LoopVectorize?
Thanks, that did it!
Are there any plans to enable the loop vectorizer by default?
From: Nadav Rotem [mailto:nrotem at apple.com]
Sent: Wednesday, April 03, 2013 13:33 PM
To: Nowicki, Tyler
Cc: LLVM Developers Mailing List
Subject: Re: Packed instructions generaetd by LoopVectorize?
Hi Tyler,
Try adding -ffast-math. We can only vectorize reduction variables if it is safe to reorder floating
2019 Dec 10
2
TypePromoteFloat loses intermediate rounding operations
For the following C code
__fp16 x, y, z, w;
void foo() {
x = y + z;
x = x + w;
}
clang produces IR that extends each operand to float and then truncates to
half before assigning to x. Like this
define dso_local void @foo() #0 !dbg !18 {
%1 = load half, half* @y, align 2, !dbg !21
%2 = fpext half %1 to float, !dbg !21
%3 = load half, half* @z, align 2, !dbg !22
%4 = fpext half %3 to float, !dbg
2019 Dec 10
2
TypePromoteFloat loses intermediate rounding operations
Thanks Eli.
I forgot to bring up the strict FP questions which I was working on when I
found this. If we're in a strict FP function, do the fp_to_f16/f16_to_fp
emitted by promoting load/store/bitcast need to be strict versions of
fp_to_f16/f16_to_fp. And if so where do we get the chain, especially for
the bitcast case which isn't a chained node.
~Craig
On Tue, Dec 10, 2019 at 3:18 PM
2013 Sep 20
0
[LLVMdev] Passing a 256 bit integer vector with XMM registers
I am implementing a new calling convention for X86 which requires to pass a 256 bit integer vector with two XMM registers rather than one YMM register. For example
define <8 x i32> @add(<8 x i32> %a, <8 x i32> %b) {
%add = add <8 x i32> %a, %b
ret <8 x i32> %add
}
With march=X86-64 and mcpu=corei7-avx, llc with the default calling convention generates the
2013 Apr 15
1
[LLVMdev] State of Loop Unrolling and Vectorization in LLVM
Hi , I have a test case (and a micro benchmark made out of the test case) to check if loop unrolling and loop vectorization is efficiently done on LLVM. Here is the test case (credits: Tyler Nowicki)
{code}
extern float * array;
extern int array_size;
float g()
{
int i;
float total = 0;
for(i = 0; i < array_size; i++)
{
total += array[i];
}
return total;
}
{code}
When
2010 Nov 20
3
[LLVMdev] Poor floating point optimizations?
On Nov 20, 2010, at 2:41 PM, Sdadsda Sdasdaas wrote:
> And also the resulting assembly code is very poor:
>
> 00460013 movss xmm0,dword ptr [esp+8]
> 00460019 movaps xmm1,xmm0
> 0046001C addss xmm1,xmm1
> 00460020 pxor xmm2,xmm2
> 00460024 addss xmm2,xmm1
> 00460028 addss xmm2,xmm0
> 0046002C movss dword ptr