David Blaikie via llvm-dev
2016-Feb-29 23:47 UTC
[llvm-dev] Possible Memory Savings for tools emitting large amounts of existing data through MC
On Mon, Feb 29, 2016 at 3:36 PM, Peter Collingbourne <peter at pcc.me.uk> wrote:> Hi David, > > The way I imagined that we might want to extend the MCStreamer API (this > was motivated by DIEData) is by allowing clients to move bytes and fixups > into the MC layer. > > This is the sort of API that I was imagining: > > void MoveAndEmitFragment(SmallVectorImpl<char> &&Data, > SmallVectorImpl<MCFixup> &&Fixups); > > Note that this mirrors the fields > MCEncodedFragmentWithContents::Contents and > MCEncodedFragmentWithFixups::Fixups > and the arguments could be directly moved into the fields of a newly > created > MCDataFragment. > > Would that work for your use case? >Not quite, unfortunately - the issue is that we're doing a task that is essentially "linking + a bit" - so imagine linking a bunch of files together, the final, say, debug_info.dwo section is made up of the concatenation of all the debug_info.dwo sections of the inputs. So it's fragmented and it's already available, memory mapped, never in a SmallVector, etc.> > Peter > > On Mon, Feb 29, 2016 at 03:18:22PM -0800, David Blaikie via llvm-dev wrote: > > Just in case it interests anyone else, I'm playing around with trying to > > broaden the MCStreamer API to allow for emission of bytes without copying > > the contents into a local buffer first (either because you already have a > > buffer, or the bytes are already present in another file, etc) in > > http://reviews.llvm.org/D17694 . In theory there's some overlap with lld > > here (no doubt it already does this sort of thing, but not in a way, I > > assume, we could reuse from other tools at the moment) and my motivation, > > llvm-dwp, looks very much like "linking with a few extra steps". > > > > But to check that these changes might be more generally applicable, I > > thought I'd solicit data from anyone building tools that might be memory > > constrained as well. > > > > First that comes to mind (Eric suggested/mentioned) is llvm-dsymutil. > > > > Adrian/Fred - do you guys ever have trouble with memory usage of > > llvm-dsymutil? Do you have an example you could provide that has high > > memory usage, so I could see if any simple changes based on my prototype > MC > > changes would help. > > > > A quick glance at dsymutil's code indicates it might benefit slightly, at > > least - in the string table emission, for example (it looks very similar > to > > string table emission in dwp - just being able to reference the strings > in > > the StringMap rather than copying them into MCStreamer could help (also I > > found using a DenseMap<StringRef to the memory mapped input helped as > well > > - but that's a change you can make locally without any MCStreamer > > improvements) - other parts might be trickier, and consist of parts of > > referencable data (like the line table header) and parts that are not > > referencable (like their contents) - my prototype could be extended to > > handle that) > > > _______________________________________________ > > LLVM Developers mailing list > > llvm-dev at lists.llvm.org > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > -- > Peter >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160229/b9689273/attachment.html>
Peter Collingbourne via llvm-dev
2016-Mar-01 01:12 UTC
[llvm-dev] Possible Memory Savings for tools emitting large amounts of existing data through MC
On Mon, Feb 29, 2016 at 03:47:49PM -0800, David Blaikie wrote:> On Mon, Feb 29, 2016 at 3:36 PM, Peter Collingbourne <peter at pcc.me.uk> > wrote: > > > Hi David, > > > > The way I imagined that we might want to extend the MCStreamer API (this > > was motivated by DIEData) is by allowing clients to move bytes and fixups > > into the MC layer. > > > > This is the sort of API that I was imagining: > > > > void MoveAndEmitFragment(SmallVectorImpl<char> &&Data, > > SmallVectorImpl<MCFixup> &&Fixups); > > > > Note that this mirrors the fields > > MCEncodedFragmentWithContents::Contents and > > MCEncodedFragmentWithFixups::Fixups > > and the arguments could be directly moved into the fields of a newly > > created > > MCDataFragment. > > > > Would that work for your use case? > > > > Not quite, unfortunately - the issue is that we're doing a task that is > essentially "linking + a bit" - so imagine linking a bunch of files > together, the final, say, debug_info.dwo section is made up of the > concatenation of all the debug_info.dwo sections of the inputs. So it's > fragmented and it's already available, memory mapped, never in a > SmallVector, etc.I see. I guess there's a couple of ways you can go with llvm-dwp: 1) Extend MC with optional ownership as you are doing in your patch. 2) Modify llvm-dwp to write object files directly. 2 is what lld does (with the help of libObject) and might not be such a bad choice, but it would be adding a lot of machinery for a very specific task that MC already needs to know how to do in a roughly target-independent way, so maybe it would be overkill. I reckon that in most cases MC clients aren't going to be copying large amounts of unowned data, they're most likely going to be creating that data. So perhaps the implementation should reflect that somehow. Specifically what I had in mind was that you could add some other derived class of MCFragment that would store a StringRef (or maybe a vector of StringRefs if that proves useful), and that would be unrelated to MCDataFragment. WDYT? Thanks, -- Peter
David Blaikie via llvm-dev
2016-Mar-01 04:57 UTC
[llvm-dev] Possible Memory Savings for tools emitting large amounts of existing data through MC
On Mon, Feb 29, 2016 at 5:12 PM, Peter Collingbourne <peter at pcc.me.uk> wrote:> On Mon, Feb 29, 2016 at 03:47:49PM -0800, David Blaikie wrote: > > On Mon, Feb 29, 2016 at 3:36 PM, Peter Collingbourne <peter at pcc.me.uk> > > wrote: > > > > > Hi David, > > > > > > The way I imagined that we might want to extend the MCStreamer API > (this > > > was motivated by DIEData) is by allowing clients to move bytes and > fixups > > > into the MC layer. > > > > > > This is the sort of API that I was imagining: > > > > > > void MoveAndEmitFragment(SmallVectorImpl<char> &&Data, > > > SmallVectorImpl<MCFixup> &&Fixups); > > > > > > Note that this mirrors the fields > > > MCEncodedFragmentWithContents::Contents and > > > MCEncodedFragmentWithFixups::Fixups > > > and the arguments could be directly moved into the fields of a newly > > > created > > > MCDataFragment. > > > > > > Would that work for your use case? > > > > > > > Not quite, unfortunately - the issue is that we're doing a task that is > > essentially "linking + a bit" - so imagine linking a bunch of files > > together, the final, say, debug_info.dwo section is made up of the > > concatenation of all the debug_info.dwo sections of the inputs. So it's > > fragmented and it's already available, memory mapped, never in a > > SmallVector, etc. > > I see. I guess there's a couple of ways you can go with llvm-dwp: > 1) Extend MC with optional ownership as you are doing in your patch. > 2) Modify llvm-dwp to write object files directly. > > 2 is what lld does (with the help of libObject) and might not be such a bad > choice, but it would be adding a lot of machinery for a very specific task > that MC already needs to know how to do in a roughly target-independent > way, > so maybe it would be overkill. >Yeah, if lld's code for doing this were more reusable that might be an option, but I assume it isn't. (alternatively, could move dwp tool to be a subproject of lld itself, a "dwp" driver that would just enable the special treatment of cu/tu_index sections) At least for my needs, the modifications to MC seem sufficiently unobtrusive & potentially generally useful (eventually LLVM might care about the memory impact of MC - perhaps for especially weird inputs (large amounts of static data, for example)).> I reckon that in most cases MC clients aren't going to be copying large > amounts of unowned data, they're most likely going to be creating that > data. So perhaps the implementation should reflect that somehow. >Yeah, even for that, though - they may not want to keep it all in memory. For example one of the next largest memory costs in dwp is the str_offsets section, where we emit a bunch of ints created by processing the input. I know how big the output will be, but I don't want/need to allocate a vector of all of them, if I could stream them out instead. So in theory we could generalize more aggressively, rather than narrow down the usage - if I could pass a thing that could be queried for size and could write bytes to the underlying entity I could save that memory too. So could LLVM - for example, type units wouldn't need to all be stored in bytes before writing any part of them out, we could stream it out to disk. (there's some buffering in Clang that adds another layer to get through - so it's actually buffered twice, the MC changes only remove one layer, we'd have to change clang (& change MC to not require pwrite to patch the header) to avoid the buffering entirely)> > Specifically what I had in mind was that you could add some other derived > class > of MCFragment that would store a StringRef (or maybe a vector of StringRefs > if that proves useful), and that would be unrelated to MCDataFragment. >(needs multiple StringRefs - the output section consists of the concatenation of all the input sections - so in the simple case it's a StringRef from each input) But yeah, could possibly narrow down the usage. I haven't looked closely at how the MCFragments are created/used/manipulated (some of this conversation may be better placed in the Differential review thread, perhaps - I do really appreciate your perspective)> > WDYT? > > Thanks, > -- > Peter >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160229/162e8626/attachment.html>
Apparently Analagous Threads
- Possible Memory Savings for tools emitting large amounts of existing data through MC
- Possible Memory Savings for tools emitting large amounts of existing data through MC
- Possible Memory Savings for tools emitting large amounts of existing data through MC
- Possible Memory Savings for tools emitting large amounts of existing data through MC
- Possible Memory Savings for tools emitting large amounts of existing data through MC