thr3ads.net - llvm dev - [llvm-dev] [cfe-dev] How to debug if LTO generate wrong code? [May 2016]

If this information is useful, please help other people find it:
Share via:

Mehdi Amini via llvm-dev

2016-May-17 20:02 UTC

[llvm-dev] [cfe-dev] How to debug if LTO generate wrong code?

> On May 17, 2016, at 11:21 AM, Umesh Kalappa <umesh.kalappa0 at
gmail.com> wrote:
> 
> Steven,
> 
> As mehdi stated , the optimisation level is specific to linker and it
> enables Inter-Pro  opts passes ,please  refer function
To be very clear: the -O option may trigger *linker* optimizations as well,
independently of LTO.

-- 
Mehdi


> 
> PassManagerBuilder::addLTOOptimizationPasses()  at
> http://llvm.org/docs/doxygen/html/PassManagerBuilder_8cpp_source.html
> 
> internal options to disable to them ,i don't think ,you can do so.
> 
> Thank you
> ~Umesh
> 
> On Tue, May 17, 2016 at 9:21 PM, Mehdi Amini via cfe-dev
> <cfe-dev at lists.llvm.org> wrote:
>> 
>> On May 17, 2016, at 1:33 AM, Shi, Steven via llvm-dev
>> <llvm-dev at lists.llvm.org> wrote:
>> 
>> Hello,
>> Let me ask a LTO simple question again. For the llvm LTO example in the
>> link:http://llvm.org/docs/LinkTimeOptimization.html, I use below build
>> commands to generate three different optimization level binary: -O0,
-O1,
>> -O2. By nm listing the foo1~4 symbols , I can see these different
>> optimizations really works.
>> 1.       How can I know what different optimizations are used by the
clang
>> LTO among -O0, -O1 and -O2?
>> 
>> 
>> LTO is linker specific, clang is only forwarding the option to the
linker
>> here.
>> 
>> 2.       Is the compiler domain optimization (e.g. clang/llvm) or the
linker
>> (e.g. ld) domain optimization make these difference?
>> 
>> 
>> In you case, you invoke clang with "emit-llvm", without any
optimization
>> level, so you get O0.
>> For what the linker is doing at these optimizations levels, again this
is
>> linker specific.
>> 
>> 3.       How can I explicitly enable or disable these specific
optimizations
>> besides using -O0, -O1, -O2?
>> 
>> 
>> If you're talking about the LTO, this is linker specific again (ld
is not
>> the same program on every platform). For instance there is no such
thing as
>> O0/O1/O2 on OS X.
>> 
>> 
>> 
>> 
>> $clang -emit-llvm -c main.c -o main.bc
>> $clang -emit-llvm -c a.c -o a.bc
>> $llvm-ar cr main.lib main.bc
>> $llvm-ar cr a.lib a.bc
>> $clang -O0 -flto main.lib a.lib -o main0
>> $clang -O1 -flto main.lib a.lib -o main1
>> $clang -O2 -flto main.lib a.lib -o main2
>> 
>> $nm main0
>> …
>> 00000000004005a0 t foo1
>> 0000000000400580 t foo2
>> 00000000004005e0 t foo3
>> 0000000000400530 t foo4
>> 0000000000400500 t frame_dummy
>> …
>> $ nm main1
>> …
>> 0000000000400550 t foo1
>> 0000000000400580 t foo3
>> 0000000000400530 t foo4
>> 0000000000400500 t frame_dummy
>> …
>> $ nm main2
>> …
>> 00000000004004d0 t frame_dummy
>> …
>> 
>> From blew verbose output, tt seems only linker( e.g. ld) is invovled to
do
>> the optimization?
>> 
>> 
>> Yes.
>> Usually the LTO pipeline is a bit different from what you're doing,
I'm used
>> to see:
>> 
>> $clang -flto -O3 -c main.c -o main.o
>> $clang -flto -O3 -c a.c -o a.o
>> $clang -flto -O3 main.o a.o -o main0
>> 
>> 
>> --
>> Mehdi
>> 
>> 
>> 
>> 
>> $ clang -O2 -flto main.lib a.lib -o main2 -v
>> clang version 3.8.0 (tags/RELEASE_380/final)
>> Target: x86_64-unknown-linux-gnu
>> Thread model: posix
>> InstalledDir: /usr/local/bin
>> Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.9
>> Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.9.3
>> Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/5.3.1
>> Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/6.0.0
>> Found candidate GCC installation:
>> /usr/local/bin/../lib/gcc/x86_64-pc-linux-gnu/7.0.0
>> Selected GCC installation:
>> /usr/local/bin/../lib/gcc/x86_64-pc-linux-gnu/7.0.0
>> Candidate multilib: .;@m64
>> Candidate multilib: 32;@m32
>> Selected multilib: .;@m64
>> "/usr/bin/ld" -z relro --hash-style=gnu --build-id
--eh-frame-hdr -m
>> elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 -o main2
>> /usr/lib/x86_64-linux-gnu/crt1.o /usr/lib/x86_64-linux-gnu/crti.o
>> /usr/local/bin/../lib/gcc/x86_64-pc-linux-gnu/7.0.0/crtbegin.o
>> -L/usr/local/bin/../lib/gcc/x86_64-pc-linux-gnu/7.0.0
>> -L/usr/local/bin/../lib/gcc/x86_64-pc-linux-gnu/7.0.0/../../../../lib64
>> -L/usr/local/bin/../lib64 -L/lib/x86_64-linux-gnu -L/lib/../lib64
>> -L/usr/lib/x86_64-linux-gnu
>> -L/usr/local/bin/../lib/gcc/x86_64-pc-linux-gnu/7.0.0/../../..
>> -L/usr/local/bin/../lib -L/lib -L/usr/lib -plugin
>> /usr/local/bin/../lib/LLVMgold.so -plugin-opt=mcpu=x86-64
-plugin-opt=O2
>> main.lib a.lib -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc
>> --as-needed -lgcc_s --no-as-needed
>> /usr/local/bin/../lib/gcc/x86_64-pc-linux-gnu/7.0.0/crtend.o
>> /usr/lib/x86_64-linux-gnu/crtn.o
>> 
>> 
>> Steven Shi
>> Intel\SSG\STO\UEFI Firmware
>> 
>> Tel: +86 021-61166522
>> iNet: 821-6522
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> 
>> 
>> 
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

Shi, Steven via llvm-dev

2016-May-29 14:36 UTC

head link

[llvm-dev] [cfe-dev] How to debug if LTO generate wrong code?

Hi Mehdi,

After deeper debug, I found my firmware LTO wrong code issue is related to X64
code model (-mcmodel=large) is always overridden as small (-mcmodel=small) if
LTO build. And I don't know how to correctly specific the large code model
for my X64 firmware LTO build. Appreciate if you could let me know it.



You know, parts of my Uefi firmware (BIOS) have to been loaded to run in high
address (larger than 2 GB) at the very beginning, and I need the code makes
absolutely no assumptions about the addresses and data sections. But current
LLVM LTO seems stick to use the small code model and generate many code with
32-bit RIP-relative addressing, which cause CPU exceptions when run in address
larger than 2GB.



Below, I just simply reuse the Eli's codemodel1.c example (link:
http://eli.thegreenplace.net/2012/01/03/understanding-the-x64-code-models) to
show the LLVM LTO code model issue.

$ clang -g -O0 codemodel1.c -mcmodel=large -o codemodel1_large.bin

$ clang -g -O0 codemodel1.c -mcmodel=small -o codemodel1_small.bin

$ clang -g -O0 -flto codemodel1.c -mcmodel=large -o codemodel1_large_lto.bin

$ clang -g -O0 -flto codemodel1.c -mcmodel=small -o codemodel1_small_lto.bin



You will see the codemodel1_large_lto.bin and codemodel1_small_lto.bin are
exactly the same!

And if you disassemble the codemodel1_large_lto.bin, you will see it uses the
small code model (32-bit RIP-relative), not large, to do addressing as below.



$ objdump -dS codemodel1_large_lto.bin



int main(int argc, const char* argv[])

{

  4004f0:       55                      push   %rbp

  4004f1:       48 89 e5                mov    %rsp,%rbp

  4004f4:       48 83 ec 20             sub    $0x20,%rsp

  4004f8:       c7 45 fc 00 00 00 00    movl   $0x0,-0x4(%rbp)

  4004ff:       89 7d f8                mov    %edi,-0x8(%rbp)

  400502:       48 89 75 f0             mov    %rsi,-0x10(%rbp)

    int t = global_func(argc);

  400506:       8b 7d f8                mov    -0x8(%rbp),%edi

  400509:       e8 d2 ff ff ff          callq  4004e0 <global_func>

  40050e:       89 45 ec                mov    %eax,-0x14(%rbp)

    t += global_arr[7];

  400511:       8b 04 25 4c 10 60 00    mov    0x60104c,%eax

  400518:       03 45 ec                add    -0x14(%rbp),%eax

  40051b:       89 45 ec                mov    %eax,-0x14(%rbp)

    t += static_arr[7];

  40051e:       8b 04 25 dc 11 60 00    mov    0x6011dc,%eax

  400525:       03 45 ec                add    -0x14(%rbp),%eax

  400528:       89 45 ec                mov    %eax,-0x14(%rbp)

    t += global_arr_big[7];

  40052b:       8b 04 25 6c 13 60 00    mov    0x60136c,%eax

  400532:       03 45 ec                add    -0x14(%rbp),%eax

  400535:       89 45 ec                mov    %eax,-0x14(%rbp)

    t += static_arr_big[7];

  400538:       8b 04 25 ac 20 63 00    mov    0x6320ac,%eax

  40053f:       03 45 ec                add    -0x14(%rbp),%eax

  400542:       89 45 ec                mov    %eax,-0x14(%rbp)

    return t;

  400545:       8b 45 ec                mov    -0x14(%rbp),%eax

  400548:       48 83 c4 20             add    $0x20,%rsp

  40054c:       5d                      pop    %rbp

  40054d:       c3                      retq

  40054e:       66 90                   xchg   %ax,%ax





So, does LTO support large code model? How to correctly specify the LTO code
model option?





Steven Shi

Intel\SSG\STO\UEFI Firmware



Tel: +86 021-61166522

iNet: 821-6522


> -----Original Message-----
> From: mehdi.amini at apple.com [mailto:mehdi.amini at apple.com]
> Sent: Wednesday, May 18, 2016 4:02 AM
> To: Umesh Kalappa <umesh.kalappa0 at gmail.com>
> Cc: Shi, Steven <steven.shi at intel.com>; llvm-dev <llvm-dev at
lists.llvm.org>;
> cfe-dev at lists.llvm.org
> Subject: Re: [cfe-dev] [llvm-dev] How to debug if LTO generate wrong code?
>
>
> > On May 17, 2016, at 11:21 AM, Umesh Kalappa
> <umesh.kalappa0 at gmail.com<mailto:umesh.kalappa0 at
gmail.com>> wrote:
> >
> > Steven,
> >
> > As mehdi stated , the optimisation level is specific to linker and it
> > enables Inter-Pro  opts passes ,please  refer function
>
> To be very clear: the -O option may trigger *linker* optimizations as well,
> independently of LTO.
>
> --
> Mehdi
>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160529/76e9d51c/attachment.html>

Mehdi Amini via llvm-dev

2016-May-29 20:27 UTC

head link

[llvm-dev] [cfe-dev] How to debug if LTO generate wrong code?

Hi,

> On May 29, 2016, at 7:36 AM, Shi, Steven <steven.shi at intel.com>
wrote:
> 
> Hi Mehdi,
> After deeper debug, I found my firmware LTO wrong code issue is related to
X64 code model (-mcmodel=large) is always overridden as small (-mcmodel=small)
if LTO build. And I don't know how to correctly specific the large code
model for my X64 firmware LTO build. Appreciate if you could let me know it.
>  
> You know, parts of my Uefi firmware (BIOS) have to been loaded to run in
high address (larger than 2 GB) at the very beginning, and I need the code makes
absolutely no assumptions about the addresses and data sections. But current
LLVM LTO seems stick to use the small code model and generate many code with
32-bit RIP-relative addressing, which cause CPU exceptions when run in address
larger than 2GB.
>  
> Below, I just simply reuse the Eli's codemodel1.c example (link:
http://eli.thegreenplace.net/2012/01/03/understanding-the-x64-code-models
<http://eli.thegreenplace.net/2012/01/03/understanding-the-x64-code-models>)
to show the LLVM LTO code model issue.
> $ clang -g -O0 codemodel1.c -mcmodel=large -o codemodel1_large.bin
> $ clang -g -O0 codemodel1.c -mcmodel=small -o codemodel1_small.bin
> $ clang -g -O0 -flto codemodel1.c -mcmodel=large -o
codemodel1_large_lto.bin
> $ clang -g -O0 -flto codemodel1.c -mcmodel=small -o
codemodel1_small_lto.bin
>  
> You will see the codemodel1_large_lto.bin and codemodel1_small_lto.bin are
exactly the same!
> And if you disassemble the codemodel1_large_lto.bin, you will see it uses
the small code model (32-bit RIP-relative), not large, to do addressing as
below.
>  
> $ objdump -dS codemodel1_large_lto.bin
>  
> int main(int argc, const char* argv[])
> {
>   4004f0:       55                      push   %rbp
>   4004f1:       48 89 e5                mov    %rsp,%rbp
>   4004f4:       48 83 ec 20             sub    $0x20,%rsp
>   4004f8:       c7 45 fc 00 00 00 00    movl   $0x0,-0x4(%rbp)
>   4004ff:       89 7d f8                mov    %edi,-0x8(%rbp)
>   400502:       48 89 75 f0             mov    %rsi,-0x10(%rbp)
>     int t = global_func(argc);
>   400506:       8b 7d f8                mov    -0x8(%rbp),%edi
>   400509:       e8 d2 ff ff ff          callq  4004e0 <global_func>
>   40050e:       89 45 ec                mov    %eax,-0x14(%rbp)
>     t += global_arr[7];
>   400511:       8b 04 25 4c 10 60 00    mov    0x60104c,%eax
>   400518:       03 45 ec                add    -0x14(%rbp),%eax
>   40051b:       89 45 ec                mov    %eax,-0x14(%rbp)
>     t += static_arr[7];
>   40051e:       8b 04 25 dc 11 60 00    mov    0x6011dc,%eax
>   400525:       03 45 ec                add    -0x14(%rbp),%eax
>   400528:       89 45 ec                mov    %eax,-0x14(%rbp)
>     t += global_arr_big[7];
>   40052b:       8b 04 25 6c 13 60 00    mov    0x60136c,%eax
>   400532:       03 45 ec                add    -0x14(%rbp),%eax
>   400535:       89 45 ec                mov    %eax,-0x14(%rbp)
>     t += static_arr_big[7];
>   400538:       8b 04 25 ac 20 63 00    mov    0x6320ac,%eax
>   40053f:       03 45 ec                add    -0x14(%rbp),%eax
>   400542:       89 45 ec                mov    %eax,-0x14(%rbp)
>     return t;
>   400545:       8b 45 ec                mov    -0x14(%rbp),%eax
>   400548:       48 83 c4 20             add    $0x20,%rsp
>   40054c:       5d                      pop    %rbp
>   40054d:       c3                      retq
>   40054e:       66 90                   xchg   %ax,%ax
>  
>  
> So, does LTO support large code model? How to correctly specify the LTO
code model option?
Same answer as before: LTO is setup by the linker, so the option for that, if it
exists, will be linker specific.

As far as I can tell, neither libLTO-based linker (ld64 on OS X for example),
neither the gold plugin supports such an option and the code model is always
"default".

I don't know about lld, CC Rafael about that.

-- 
Mehdi



>  
>  
> Steven Shi
> Intel\SSG\STO\UEFI Firmware
>  
> Tel: +86 021-61166522
> iNet: 821-6522
>  
>  <>> -----Original Message-----
> > From: mehdi.amini at apple.com <mailto:mehdi.amini at apple.com>
[mailto:mehdi.amini at apple.com <mailto:mehdi.amini at apple.com>]
> > Sent: Wednesday, May 18, 2016 4:02 AM
> > To: Umesh Kalappa <umesh.kalappa0 at gmail.com
<mailto:umesh.kalappa0 at gmail.com>>
> > Cc: Shi, Steven <steven.shi at intel.com <mailto:steven.shi at
intel.com>>; llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev
at lists.llvm.org>>;
> > cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
> > Subject: Re: [cfe-dev] [llvm-dev] How to debug if LTO generate wrong
code?
> > 
> > 
> > > On May 17, 2016, at 11:21 AM, Umesh Kalappa
> > <umesh.kalappa0 at gmail.com <mailto:umesh.kalappa0 at
gmail.com>> wrote:
> > >
> > > Steven,
> > >
> > > As mehdi stated , the optimisation level is specific to linker
and it
> > > enables Inter-Pro  opts passes ,please  refer function
> > 
> > To be very clear: the -O option may trigger *linker* optimizations as
well,
> > independently of LTO.
> > 
> > --
> > Mehdi
> > 
> > 
> > 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160529/baef4088/attachment.html>

Maybe Matching Threads

Search for more maybe matching threads

llvm dev - May 2016 - [cfe-dev] How to debug if LTO generate wrong code?

[llvm-dev] [cfe-dev] How to debug if LTO generate wrong code?

[llvm-dev] [cfe-dev] How to debug if LTO generate wrong code?

[llvm-dev] [cfe-dev] How to debug if LTO generate wrong code?

Maybe Matching Threads