thr3ads.net - llvm dev - [llvm-dev] llvm-symbolizer memory usage [Jan 2020]

If this information is useful, please help other people find it:
Share via:

Francis Ricci via llvm-dev

2020-Jan-14 15:01 UTC

[llvm-dev] llvm-symbolizer memory usage

I work on a linux program with restricted RSS limits (a couple hundred MB),
and one of the things this program does is symbolication. Ideally, we'd
like to use llvm-symbolizer for this symbolication (because we get things
like function inlining that we can't get from cheaper symbolizers), but for
large binaries, the memory usage gets pretty huge.

Based on some memory profiling, it looks like the majority of this memory
cost comes from mmap-ing the binary to be symbolized (via
`llvm::object::createBinary"). This alone comes with hundreds of MB of cost
in many cases.

I have 2 questions here:
1) Does it seem feasible to make llvm-symbolizer work *without* loading the
full binary into memory (perhaps just reading sections from disk as needed,
at the cost of some extra CPU)?
2) If we figured this out, and put it behind something like a
"--low-memory" flag, would it be something the upstream community
would
accept?

Francis
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200114/aa48a065/attachment.html>

David Blaikie via llvm-dev

2020-Jan-14 22:32 UTC

head link

[llvm-dev] llvm-symbolizer memory usage

(Adding Hyoun who's been looking at memory use of llvm-symbolizer recently
too)

On Tue, Jan 14, 2020 at 11:07 AM Francis Ricci via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> I work on a linux program with restricted RSS limits (a couple hundred
> MB), and one of the things this program does is symbolication. Ideally,
> we'd like to use llvm-symbolizer for this symbolication (because we get
> things like function inlining that we can't get from cheaper
symbolizers),
> but for large binaries, the memory usage gets pretty huge.
>
> Based on some memory profiling, it looks like the majority of this memory
> cost comes from mmap-ing the binary to be symbolized (via
> `llvm::object::createBinary"). This alone comes with hundreds of MB of
cost
> in many cases.
>
> I have 2 questions here:
> 1) Does it seem feasible to make llvm-symbolizer work *without* loading
> the full binary into memory (perhaps just reading sections from disk as
> needed, at the cost of some extra CPU)?
>
Does memory mapping the file actually use real memory? Or is it just
reading from the file, effectively? I don't think the mapped file was part
of the memory usage Hyoun and I encountered when doing memory accounting.
What we were talking about was an LRU cache of DwarfCompileUnits, or
something like that - to strip out the DIEArrays and other associated data
structures after they were used.

Are you running llvm-symbolizer on many input addresses in a single run?
Only a single address? Optimized or unoptimized build of llvm-symbolizer?

> 2) If we figured this out, and put it behind something like a
> "--low-memory" flag, would it be something the upstream community
would
> accept?
>
Maybe, though I'm hoping we can avoid having to have too much of a perf
tradeoff for low memory usage, so we can keep it all together without a
flag.

>
> Francis
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200114/eaea9213/attachment.html>

Hyoun Kyu Cho via llvm-dev

2020-Jan-14 23:50 UTC

head link

[llvm-dev] llvm-symbolizer memory usage

On Tue, Jan 14, 2020 at 2:32 PM David Blaikie <dblaikie at gmail.com>
wrote:
> (Adding Hyoun who's been looking at memory use of llvm-symbolizer
recently
> too)
>
> On Tue, Jan 14, 2020 at 11:07 AM Francis Ricci via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> I work on a linux program with restricted RSS limits (a couple hundred
>> MB), and one of the things this program does is symbolication. Ideally,
>> we'd like to use llvm-symbolizer for this symbolication (because we
get
>> things like function inlining that we can't get from cheaper
symbolizers),
>> but for large binaries, the memory usage gets pretty huge.
>>
>> Based on some memory profiling, it looks like the majority of this
memory
>> cost comes from mmap-ing the binary to be symbolized (via
>> `llvm::object::createBinary"). This alone comes with hundreds of
MB of cost
>> in many cases.
>>
>> I have 2 questions here:
>> 1) Does it seem feasible to make llvm-symbolizer work *without* loading
>> the full binary into memory (perhaps just reading sections from disk as
>> needed, at the cost of some extra CPU)?
>>
>
> Does memory mapping the file actually use real memory? Or is it just
> reading from the file, effectively? I don't think the mapped file was
part
> of the memory usage Hyoun and I encountered when doing memory accounting.
> What we were talking about was an LRU cache of DwarfCompileUnits, or
> something like that - to strip out the DIEArrays and other associated data
> structures after they were used.
>
I might be wrong because I'm not familiar with LLVM. When I tried to reduce
the RSS of our symbolizer usage, I also saw both input file mapping and
internal data structure (DIEArray, line table, etc.) took significant
memory.

As Dave mentioned, I've tried LRU caching for the internal data structure
and that could reduce the memory usage quite a bit for our use case of
symbolizing many addresses in a single run. We're working on somehow
upstreaming the caching.

The input file part seems more complicated. For us, the file is
memory-mapped and the kernel only brings in needed pages. It was a problem
for us because we need to symbolize many addresses and the kernel couldn't
handle the access pattern very well leaving the entire file in memory. I
could reduce RSS by inserting madvise(MADV_DONTNEED) here and there, but I
don't think it's likely to be upstreamed.

While I follow the code path for memory mapping the input file, I vaguely
recall seeing other code paths that could just alloc memory worth the
entire file and copy it when memory-mapped file is not available. Is this
the case for you?

Thanks,
HK


>
> Are you running llvm-symbolizer on many input addresses in a single run?
> Only a single address? Optimized or unoptimized build of llvm-symbolizer?
>
>
>> 2) If we figured this out, and put it behind something like a
>> "--low-memory" flag, would it be something the upstream
community would
>> accept?
>>
>
> Maybe, though I'm hoping we can avoid having to have too much of a perf
> tradeoff for low memory usage, so we can keep it all together without a
> flag.
>
>
>>
>> Francis
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200114/879fcbc0/attachment.html>

Apparently Analagous Threads

Search for more seemingly similar threads

llvm dev - Jan 2020 - llvm-symbolizer memory usage

[llvm-dev] llvm-symbolizer memory usage

[llvm-dev] llvm-symbolizer memory usage

[llvm-dev] llvm-symbolizer memory usage

Apparently Analagous Threads