thr3ads.net - llvm dev - [LLVMdev] Bitcode parsing performance [Jan 2014]

If this information is useful, please help other people find it:
Share via:

Kevin Modzelewski

2014-Jan-10 00:37 UTC

[LLVMdev] Bitcode parsing performance

Hi all, I'm trying to reduce the startup time for my JIT, but I'm
running
into the problem that the majority of the time is spent loading the bitcode
for my standard library, and I suspect it's due to debug info.  My stdlib
is currently about 2kloc in a number of C++ files; I compile them with
clang -g -emit-llvm, then link them together with llvm-link, call opt -O3
on it, and arrive at a 1MB bitcode file.  I then embed this as a binary
blob into my executable, and call ParseBitcodeFile on it at startup.

Unfortunately, this parsing takes about 60ms right now, which is the main
component of my ~100ms time to run on an empty source file (another ~20ms
is loading the pre-jit'd image through an ObjectCache).  I thought I'd
save
some time by using getLazyBitcodeModule, since the IR isn't actually needed
right away, but this only reduced the parsing time (ie the time of the
actual getLazyBitcodeModule() call) to 45ms, which I thought was
surprising.  I also tested computing the bytewise-xor of the bitcode file
to make sure that it was fully read into memory, which took about 5ms, so
the majority of the time does seem to be spent parsing.

Then I switched back to ParseBitcodeFile, but now I added the
"-strip-debug" flag to my opt invocation, which reduced the bitcode
file
down to about 100KB, and reduced the parsing time to 20ms.  What surprised
me the most was that if I then switched to getLazyBitcodeModule, the
parsing time was cut down to 3ms, which is what I was originally expecting.
 So when lazy loading, stripping out the debug info cuts down the
initialization time from 45ms to 3ms, which is why I suspect that
getLazyBitcodeModule is still parsing all of the debug info.


To work around it, I can generate separate builds, one with debug info and
one without, but I'd like to avoid doing that. I did some simple profiling
of what getLazyBitcodeModule was doing, and it wasn't terribly informative
(spends most of its time in parsing-related functions); does anyone have
any ideas if this is something that could be fixable or if I should just
move on?

Thanks,
Kevin
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140109/32e01715/attachment.html>

Chandler Carruth

2014-Jan-10 04:07 UTC

head link

[LLVMdev] Bitcode parsing performance

Any chance you can share either your bitcode file or some other bitcode
file that seems about the same size and generally representative of the
performance problems you're having?


On Thu, Jan 9, 2014 at 4:37 PM, Kevin Modzelewski <kmod at dropbox.com>
wrote:
> Hi all, I'm trying to reduce the startup time for my JIT, but I'm
running
> into the problem that the majority of the time is spent loading the bitcode
> for my standard library, and I suspect it's due to debug info.  My
stdlib
> is currently about 2kloc in a number of C++ files; I compile them with
> clang -g -emit-llvm, then link them together with llvm-link, call opt -O3
> on it, and arrive at a 1MB bitcode file.  I then embed this as a binary
> blob into my executable, and call ParseBitcodeFile on it at startup.
>
> Unfortunately, this parsing takes about 60ms right now, which is the main
> component of my ~100ms time to run on an empty source file (another ~20ms
> is loading the pre-jit'd image through an ObjectCache).  I thought
I'd save
> some time by using getLazyBitcodeModule, since the IR isn't actually
needed
> right away, but this only reduced the parsing time (ie the time of the
> actual getLazyBitcodeModule() call) to 45ms, which I thought was
> surprising.  I also tested computing the bytewise-xor of the bitcode file
> to make sure that it was fully read into memory, which took about 5ms, so
> the majority of the time does seem to be spent parsing.
>
> Then I switched back to ParseBitcodeFile, but now I added the
> "-strip-debug" flag to my opt invocation, which reduced the
bitcode file
> down to about 100KB, and reduced the parsing time to 20ms.  What surprised
> me the most was that if I then switched to getLazyBitcodeModule, the
> parsing time was cut down to 3ms, which is what I was originally expecting.
>  So when lazy loading, stripping out the debug info cuts down the
> initialization time from 45ms to 3ms, which is why I suspect that
> getLazyBitcodeModule is still parsing all of the debug info.
>
>
> To work around it, I can generate separate builds, one with debug info and
> one without, but I'd like to avoid doing that. I did some simple
profiling
> of what getLazyBitcodeModule was doing, and it wasn't terribly
informative
> (spends most of its time in parsing-related functions); does anyone have
> any ideas if this is something that could be fixable or if I should just
> move on?
>
> Thanks,
> Kevin
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140109/ef378197/attachment.html>

Sean Silva

2014-Jan-10 08:09 UTC

head link

[LLVMdev] Bitcode parsing performance

This Summer I was working on LTO and Rafael mentioned to me that debug info
is not lazy loaded, which was the cause for the insane resource usage I was
seeing when doing LTO with debug info. This is likely the reason that the
lazy loading was so ineffective for your debug build.

Rafael, am I remembering this right/can you give more information? I expect
that this will have to get fixed before pitching LLD as a turnkey LTO
solution (not sure where in the priority list it is).

-- Sean Silva


On Thu, Jan 9, 2014 at 5:37 PM, Kevin Modzelewski <kmod at dropbox.com>
wrote:
> Hi all, I'm trying to reduce the startup time for my JIT, but I'm
running
> into the problem that the majority of the time is spent loading the bitcode
> for my standard library, and I suspect it's due to debug info.  My
stdlib
> is currently about 2kloc in a number of C++ files; I compile them with
> clang -g -emit-llvm, then link them together with llvm-link, call opt -O3
> on it, and arrive at a 1MB bitcode file.  I then embed this as a binary
> blob into my executable, and call ParseBitcodeFile on it at startup.
>
> Unfortunately, this parsing takes about 60ms right now, which is the main
> component of my ~100ms time to run on an empty source file (another ~20ms
> is loading the pre-jit'd image through an ObjectCache).  I thought
I'd save
> some time by using getLazyBitcodeModule, since the IR isn't actually
needed
> right away, but this only reduced the parsing time (ie the time of the
> actual getLazyBitcodeModule() call) to 45ms, which I thought was
> surprising.  I also tested computing the bytewise-xor of the bitcode file
> to make sure that it was fully read into memory, which took about 5ms, so
> the majority of the time does seem to be spent parsing.
>
> Then I switched back to ParseBitcodeFile, but now I added the
> "-strip-debug" flag to my opt invocation, which reduced the
bitcode file
> down to about 100KB, and reduced the parsing time to 20ms.  What surprised
> me the most was that if I then switched to getLazyBitcodeModule, the
> parsing time was cut down to 3ms, which is what I was originally expecting.
>  So when lazy loading, stripping out the debug info cuts down the
> initialization time from 45ms to 3ms, which is why I suspect that
> getLazyBitcodeModule is still parsing all of the debug info.
>
>
> To work around it, I can generate separate builds, one with debug info and
> one without, but I'd like to avoid doing that. I did some simple
profiling
> of what getLazyBitcodeModule was doing, and it wasn't terribly
informative
> (spends most of its time in parsing-related functions); does anyone have
> any ideas if this is something that could be fixable or if I should just
> move on?
>
> Thanks,
> Kevin
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140110/dd9ca2ba/attachment.html>

Eric Christopher

2014-Jan-10 08:14 UTC

head link

[LLVMdev] Bitcode parsing performance

That was likely type information and should mostly be fixed up. It's still
not lazily loaded, but is going to be ridiculously smaller now.

-eric

On Fri Jan 10 2014 at 12:11:52 AM, Sean Silva <chisophugis at gmail.com>
wrote:
> This Summer I was working on LTO and Rafael mentioned to me that debug
> info is not lazy loaded, which was the cause for the insane resource usage
> I was seeing when doing LTO with debug info. This is likely the reason that
> the lazy loading was so ineffective for your debug build.
>
> Rafael, am I remembering this right/can you give more information? I
> expect that this will have to get fixed before pitching LLD as a turnkey
> LTO solution (not sure where in the priority list it is).
>
> -- Sean Silva
> On Thu, Jan 9, 2014 at 5:37 PM, Kevin Modzelewski <kmod at
dropbox.com>wrote:
>
> Hi all, I'm trying to reduce the startup time for my JIT, but I'm
running
> into the problem that the majority of the time is spent loading the bitcode
> for my standard library, and I suspect it's due to debug info.  My
stdlib
> is currently about 2kloc in a number of C++ files; I compile them with
> clang -g -emit-llvm, then link them together with llvm-link, call opt -O3
> on it, and arrive at a 1MB bitcode file.  I then embed this as a binary
> blob into my executable, and call ParseBitcodeFile on it at startup.
>
> Unfortunately, this parsing takes about 60ms right now, which is the main
> component of my ~100ms time to run on an empty source file (another ~20ms
> is loading the pre-jit'd image through an ObjectCache).  I thought
I'd save
> some time by using getLazyBitcodeModule, since the IR isn't actually
needed
> right away, but this only reduced the parsing time (ie the time of the
> actual getLazyBitcodeModule() call) to 45ms, which I thought was
> surprising.  I also tested computing the bytewise-xor of the bitcode file
> to make sure that it was fully read into memory, which took about 5ms, so
> the majority of the time does seem to be spent parsing.
>
> Then I switched back to ParseBitcodeFile, but now I added the
> "-strip-debug" flag to my opt invocation, which reduced the
bitcode file
> down to about 100KB, and reduced the parsing time to 20ms.  What surprised
> me the most was that if I then switched to getLazyBitcodeModule, the
> parsing time was cut down to 3ms, which is what I was originally expecting.
>  So when lazy loading, stripping out the debug info cuts down the
> initialization time from 45ms to 3ms, which is why I suspect that
> getLazyBitcodeModule is still parsing all of the debug info.
>
>
> To work around it, I can generate separate builds, one with debug info and
> one without, but I'd like to avoid doing that. I did some simple
profiling
> of what getLazyBitcodeModule was doing, and it wasn't terribly
informative
> (spends most of its time in parsing-related functions); does anyone have
> any ideas if this is something that could be fixable or if I should just
> move on?
>
> Thanks,
> Kevin
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140110/e63c4b6b/attachment.html>

Rafael Espíndola

2014-Jan-10 14:05 UTC

head link

[LLVMdev] Bitcode parsing performance

On 10 January 2014 03:09, Sean Silva <chisophugis at gmail.com>
wrote:> This Summer I was working on LTO and Rafael mentioned to me that debug info
> is not lazy loaded, which was the cause for the insane resource usage I was
> seeing when doing LTO with debug info. This is likely the reason that the
> lazy loading was so ineffective for your debug build.
>
> Rafael, am I remembering this right/can you give more information? I expect
> that this will have to get fixed before pitching LLD as a turnkey LTO
> solution (not sure where in the priority list it is).
In the case of LTO, there were two main issues.

* Duplicate type debug information.
* All metadata (including debug info) is loaded eagerly.

As Eric mentioned, we can now merge type debug info from multiple
translation units, which results in a smaller total size. Kevin, what
llvm version are you using? Do you get a smaller combined bitcode with
trunk?

The issue of loading all of the debug info ahead of time is still
there. We will need to fix that at some point or reduce its size
further so that the impact is small enough.

Cheers,
Rafael

Apparently Analagous Threads

Search for more maybe matching threads

llvm dev - Jan 2014 - [LLVMdev] Bitcode parsing performance

[LLVMdev] Bitcode parsing performance

[LLVMdev] Bitcode parsing performance

[LLVMdev] Bitcode parsing performance

[LLVMdev] Bitcode parsing performance

[LLVMdev] Bitcode parsing performance

Apparently Analagous Threads