thr3ads.net - llvm dev - [LLVMdev] Building for a specific target, corei7 [Oct 2013]

If this information is useful, please help other people find it:
Share via:

Varun Agrawal

2013-Oct-12 03:48 UTC

[LLVMdev] Building for a specific target, corei7

Hi Andrew,

I think I diluted my question. My question was not related to MCJIT.

I ran the following 4 scenarios:
(1)gcc -mcpu=corei7 tetris.c -o tetris
(2)gcc -mcpu=athlon64 tetris.c -o tetris
(3)clang -march=corei7 tetris.c -o tetris
(4)clang -march=athlon64 tetris.c -o tetris

In (1) and (2), I see difference in order of instructions in the output
binaries, which I expected because every CPU has different micro-architecture,
and the compiler is hopefully making use of that information. (I need to verify
the performance improvement, this is not related to my question)
But, in (3) and (4), I don't see any difference in the output binaries other
than instruction set extensions. This means that there is some optimization, but
not based on the micro-architecture of the CPU.

I just want  to ask if this is the expected behavior. And if so, then is this
optimization going to be added to LLVM anytime in the future?

Thanks,
Varun Agrawal

From: Kaylor, Andrew [mailto:andrew.kaylor at intel.com]
Sent: Friday, October 11, 2013 4:17 PM
To: Varun Agrawal; llvmdev at cs.uiuc.edu
Subject: RE: Building for a specific target, corei7

Hi Varun,

Have you tried your experiment with icc by any chance?

The MCJIT component does not assume that you will be executing the generated
code on the host system because it can be used to generate code for external
targets.  However, you can specify the CPU by calling setCPU() on the
EngineBuilder object before creating your execution engine.  (You can use
sys::getHostCPUName() to figure out what CPU you are running on and that will
further detect AVX support, which you don't get with the general
"corei7" cpu flag.) I would expect that if you do that it would
generate similar code to clang.

-Andy


From: llvmdev-bounces at cs.uiuc.edu<mailto:llvmdev-bounces at
cs.uiuc.edu> [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Varun
Agrawal
Sent: Thursday, October 10, 2013 10:52 PM
To: llvmdev at cs.uiuc.edu<mailto:llvmdev at cs.uiuc.edu>
Subject: [LLVMdev] Building for a specific target, corei7

Hi,

I am using the LLVM JIT infrastructure (MCJIT). I wanted to see if there are any
performance gains as the compiler can detect the target CPU at runtime. But, I
didn't see any improvement (I compile with -no-mmx and -no-sse).

I then tried an experiment, where I compiled the program with clang-3.3, with
and without specifying the target cpu as "corei7". I was shocked to
see that the only difference in the two binaries were related to
"Instruction Set Extensions".

Further I tried the same experiment with gcc, and saw that the instructions were
shuffled around in the binary. I expected this, because every CPU differs in
some way or the other (has different buffer size for out-of-order execution,
different cache sizes, etc.).

For clang, I was passing the "-march=corei7" flag.
For gcc, I was passing the "-mcpu=corei7" flag.

Am I passing the correct flags?
Any help, comments or suggestions, would be helpful.

Thanks,
--
Varun Agrawal
PhD Student
Computer Science, Stony Brook University
http://compas.cs.stonybrook.edu/~vagrawal/

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131012/9aed7f2c/attachment.html>

Kaylor, Andrew

2013-Oct-14 16:39 UTC

head link

[LLVMdev] Building for a specific target, corei7

Hi Varun,

I see the point of your question, but I'm not the best person to answer from
that perspective.

Nadav Rotem is the owner of the x86 backend, and he can probably give you a more
complete answer than I could.

Thanks,
Andy


From: Varun Agrawal [mailto:vagrawal at cs.stonybrook.edu]
Sent: Friday, October 11, 2013 8:48 PM
To: Kaylor, Andrew; llvmdev at cs.uiuc.edu
Subject: RE: Building for a specific target, corei7

Hi Andrew,

I think I diluted my question. My question was not related to MCJIT.

I ran the following 4 scenarios:
(1)gcc -mcpu=corei7 tetris.c -o tetris
(2)gcc -mcpu=athlon64 tetris.c -o tetris
(3)clang -march=corei7 tetris.c -o tetris
(4)clang -march=athlon64 tetris.c -o tetris

In (1) and (2), I see difference in order of instructions in the output
binaries, which I expected because every CPU has different micro-architecture,
and the compiler is hopefully making use of that information. (I need to verify
the performance improvement, this is not related to my question)
But, in (3) and (4), I don't see any difference in the output binaries other
than instruction set extensions. This means that there is some optimization, but
not based on the micro-architecture of the CPU.

I just want  to ask if this is the expected behavior. And if so, then is this
optimization going to be added to LLVM anytime in the future?

Thanks,
Varun Agrawal

From: Kaylor, Andrew [mailto:andrew.kaylor at intel.com]
Sent: Friday, October 11, 2013 4:17 PM
To: Varun Agrawal; llvmdev at cs.uiuc.edu<mailto:llvmdev at cs.uiuc.edu>
Subject: RE: Building for a specific target, corei7

Hi Varun,

Have you tried your experiment with icc by any chance?

The MCJIT component does not assume that you will be executing the generated
code on the host system because it can be used to generate code for external
targets.  However, you can specify the CPU by calling setCPU() on the
EngineBuilder object before creating your execution engine.  (You can use
sys::getHostCPUName() to figure out what CPU you are running on and that will
further detect AVX support, which you don't get with the general
"corei7" cpu flag.) I would expect that if you do that it would
generate similar code to clang.

-Andy


From: llvmdev-bounces at cs.uiuc.edu<mailto:llvmdev-bounces at
cs.uiuc.edu> [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Varun
Agrawal
Sent: Thursday, October 10, 2013 10:52 PM
To: llvmdev at cs.uiuc.edu<mailto:llvmdev at cs.uiuc.edu>
Subject: [LLVMdev] Building for a specific target, corei7

Hi,

I am using the LLVM JIT infrastructure (MCJIT). I wanted to see if there are any
performance gains as the compiler can detect the target CPU at runtime. But, I
didn't see any improvement (I compile with -no-mmx and -no-sse).

I then tried an experiment, where I compiled the program with clang-3.3, with
and without specifying the target cpu as "corei7". I was shocked to
see that the only difference in the two binaries were related to
"Instruction Set Extensions".

Further I tried the same experiment with gcc, and saw that the instructions were
shuffled around in the binary. I expected this, because every CPU differs in
some way or the other (has different buffer size for out-of-order execution,
different cache sizes, etc.).

For clang, I was passing the "-march=corei7" flag.
For gcc, I was passing the "-mcpu=corei7" flag.

Am I passing the correct flags?
Any help, comments or suggestions, would be helpful.

Thanks,
--
Varun Agrawal
PhD Student
Computer Science, Stony Brook University
http://compas.cs.stonybrook.edu/~vagrawal/

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131014/5e5b0f49/attachment.html>

Nadav Rotem

2013-Oct-14 16:50 UTC

head link

[LLVMdev] Building for a specific target, corei7

Hi Andrew and Varun, 

The most interesting additions to the x86 instruction set since the move to
64bits was the addition of additional vector instructions. If your code is not
vectorizable then you should see similar code.  With the new MI Scheduler  (to
be enabled by default soon) you may see greater differences in the binary,
because it has a better machine model.  At the moment we don’t add code that
checks for the CPUID at runtime, but this is an interesting feature to discuss. 
If during your analysis you run into interesting findings then please share them
with us on the mailing list. We are constantly looking for opportunities to
improve the compiler.

Thanks,
Nadav   


On Oct 14, 2013, at 9:39 AM, Kaylor, Andrew <andrew.kaylor at intel.com>
wrote:
> Hi Varun,
>  
> I see the point of your question, but I’m not the best person to answer
from that perspective.
>  
> Nadav Rotem is the owner of the x86 backend, and he can probably give you a
more complete answer than I could.
>  
> Thanks,
> Andy
>  
>  
> From: Varun Agrawal [mailto:vagrawal at cs.stonybrook.edu] 
> Sent: Friday, October 11, 2013 8:48 PM
> To: Kaylor, Andrew; llvmdev at cs.uiuc.edu
> Subject: RE: Building for a specific target, corei7
>  
> Hi Andrew,
>  
> I think I diluted my question. My question was not related to MCJIT.
>  
> I ran the following 4 scenarios:
> (1)gcc –mcpu=corei7 tetris.c –o tetris
> (2)gcc –mcpu=athlon64 tetris.c –o tetris
> (3)clang –march=corei7 tetris.c –o tetris
> (4)clang –march=athlon64 tetris.c –o tetris
>  
> In (1) and (2), I see difference in order of instructions in the output
binaries, which I expected because every CPU has different micro-architecture,
and the compiler is hopefully making use of that information. (I need to verify
the performance improvement, this is not related to my question)
> But, in (3) and (4), I don’t see any difference in the output binaries
other than instruction set extensions. This means that there is some
optimization, but not based on the micro-architecture of the CPU.
>  
> I just want  to ask if this is the expected behavior. And if so, then is
this optimization going to be added to LLVM anytime in the future?
>  
> Thanks,
> Varun Agrawal
>  
> From: Kaylor, Andrew [mailto:andrew.kaylor at intel.com] 
> Sent: Friday, October 11, 2013 4:17 PM
> To: Varun Agrawal; llvmdev at cs.uiuc.edu
> Subject: RE: Building for a specific target, corei7
>  
> Hi Varun,
>  
> Have you tried your experiment with icc by any chance?
>  
> The MCJIT component does not assume that you will be executing the
generated code on the host system because it can be used to generate code for
external targets.  However, you can specify the CPU by calling setCPU() on the
EngineBuilder object before creating your execution engine.  (You can use
sys::getHostCPUName() to figure out what CPU you are running on and that will
further detect AVX support, which you don’t get with the general “corei7” cpu
flag.) I would expect that if you do that it would generate similar code to
clang.
>  
> -Andy
>  
>  
> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at
cs.uiuc.edu] On Behalf Of Varun Agrawal
> Sent: Thursday, October 10, 2013 10:52 PM
> To: llvmdev at cs.uiuc.edu
> Subject: [LLVMdev] Building for a specific target, corei7
>  
> Hi,
>  
> I am using the LLVM JIT infrastructure (MCJIT). I wanted to see if there
are any performance gains as the compiler can detect the target CPU at runtime.
But, I didn’t see any improvement (I compile with –no-mmx and –no-sse).
>  
> I then tried an experiment, where I compiled the program with clang-3.3,
with and without specifying the target cpu as “corei7”. I was shocked to see
that the only difference in the two binaries were related to “Instruction Set
Extensions”.
>  
> Further I tried the same experiment with gcc, and saw that the instructions
were shuffled around in the binary. I expected this, because every CPU differs
in some way or the other (has different buffer size for out-of-order execution,
different cache sizes, etc.).
>  
> For clang, I was passing the “-march=corei7” flag.
> For gcc, I was passing the “-mcpu=corei7” flag.
>  
> Am I passing the correct flags?
> Any help, comments or suggestions, would be helpful.
>  
> Thanks,
> --
> Varun Agrawal
> PhD Student
> Computer Science, Stony Brook University
> http://compas.cs.stonybrook.edu/~vagrawal/
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131014/1552fb37/attachment.html>

Maybe Matching Threads

Search for more apparently analagous threads

llvm dev - Oct 2013 - [LLVMdev] Building for a specific target, corei7

[LLVMdev] Building for a specific target, corei7

[LLVMdev] Building for a specific target, corei7

[LLVMdev] Building for a specific target, corei7

Maybe Matching Threads