Serge Rogatch via llvm-dev
2016-Aug-08 13:27 UTC
[llvm-dev] XRay: Demo on x86_64/Linux almost done; some questions.
I think that 32-bit systems (especially ARM) may be short on memory so doubling the size of the table containing (potentially) all the functions may give a tangible overhead. I would even align the entries to 4 bytes (so 12 bytes per entry) on 32-bit platforms and to 8 bytes (so 24-bytes per entry) on 64-bit platforms, to improve CPU cache hits. What do you think? Cheers, Serge On 8 August 2016 at 06:34, Dean Michael Berris <dean.berris at gmail.com> wrote:> > > On 6 Aug 2016, at 04:06, Serge Rogatch <serge.rogatch at gmail.com> wrote: > > > > Hi Dean, > > > > I have a question for 32-bit platforms. I see in the code that you used > the following in compiler-rt/trunk/lib/xray/xray_interface_internal.h : > > struct XRaySledEntry { > > uint64_t Address; > > uint64_t Function; > > unsigned char Kind; > > unsigned char AlwaysInstrument; > > unsigned char Padding[14]; // Need 32 bytes > > }; > > > > And the peer code in llvm/trunk/lib/Target/X86/X86MCInstLower.cpp : > > > > void X86AsmPrinter::EmitXRayTable() { > > if (Sleds.empty()) > > return; > > if (Subtarget->isTargetELF()) { > > auto *Section = OutContext.getELFSection( > > "xray_instr_map", ELF::SHT_PROGBITS, > > ELF::SHF_ALLOC | ELF::SHF_GROUP | ELF::SHF_MERGE, 0, > > CurrentFnSym->getName()); > > auto PrevSection = OutStreamer->getCurrentSectionOnly(); > > OutStreamer->SwitchSection(Section); > > for (const auto &Sled : Sleds) { > > OutStreamer->EmitSymbolValue(Sled.Sled, 8); > > OutStreamer->EmitSymbolValue(CurrentFnSym, 8); > > auto Kind = static_cast<uint8_t>(Sled.Kind); > > OutStreamer->EmitBytes( > > StringRef(reinterpret_cast<const char *>(&Kind), 1)); > > OutStreamer->EmitBytes( > > StringRef(reinterpret_cast<const char > *>(&Sled.AlwaysInstrument), 1)); > > OutStreamer->EmitZeros(14); > > } > > OutStreamer->SwitchSection(PrevSection); > > } > > Sleds.clear(); > > } > > > > So useful part of your entry is 18 bytes, and you add 14 bytes of > padding to achieve 32-byte alignment. But for 32-bit CPUs I understood that > XRaySledEntry::Address and XRaySledEntry::Function can be 32-bit too, > right? So the entry can fit 16 bytes, with 10 useful bytes and 6 bytes of > padding. Can I use 16-byte entries or is there some external (OS? ELF? > Linker?) requirement that one entry must be 32-bytes, or aligned at 32 > bytes, etc.? > > > > Good question Serge -- technically there isn't any specific external > requirement here, but that supporting 32-bit x86 isn't a priority for me > right now. I suspect it's possible to support 32-bit x86 with a similar > approach (modifying both the LLVM back-end to emit the right assembly for > 32-bit x86 and maybe for 32-bit non-x86, as well as compiler-rt to work on > 32-bit x86) but that I haven't had the time to explore this yet. > > I'm positive that it's doable though and that we know the right places > where the changes have to happen. There's some work being done on the > tooling side of things and I suspect once we have a standardised log > format, things like endianness and sizes of certain values start becoming > an issue. For example, if I build a log analysis tool to work on 64-bit > systems, whether it should be able to handle log files generated in 32-bit > systems (and be able to read differently-sized instrumentation map > sections). > > For this reason, I think it's better to stay consistent and > forward-compatible (i.e. not have special cases for 32-bit platforms). I do > think it's important to support 32-bit systems too, and I'd be more than > happy to review patches that would make it possible (until say I get the > time to support 32-bit platforms later on). > > Does that make sense? > > Cheers > > -- Dean > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160808/9be68189/attachment.html>
Dean Michael Berris via llvm-dev
2016-Aug-08 14:41 UTC
[llvm-dev] XRay: Demo on x86_64/Linux almost done; some questions.
> On 8 Aug 2016, at 23:27, Serge Rogatch <serge.rogatch at gmail.com> wrote: > > I think that 32-bit systems (especially ARM) may be short on memory so doubling the size of the table containing (potentially) all the functions may give a tangible overhead. I would even align the entries to 4 bytes (so 12 bytes per entry) on 32-bit platforms and to 8 bytes (so 24-bytes per entry) on 64-bit platforms, to improve CPU cache hits. What do you think?It should work, but I'm a little wary about painting ourselves into a corner -- for example, I'm already designing some extensions to this table to represent other kinds of information (that might fit into what currently is padding, or use some more bits of the bytes in the entries). I might need to either have some sort of versioning introduced into this table so that tools reading the same table (not necessarily the runtime) can determine what kinds of information will be available in the entries. Although you're right, maybe 14 bytes of padding is a little excessive but I'm being very conservative here. :D Basically the trade-off is between binary/resident size and tooling support. For 32-bit systems I think its possible to have smaller entries at the cost of making the tooling and runtime a bit more complex -- i.e. there's going to be a special implementation for x86 32-bit and 64-bit, arm 32-bit and 64-bit, etc. Then we think about the tools external to the runtime that will access the same table. We can probably write tools that extract the table from binaries in COFF, ELF, and MachO then turn those into a canonicalised instrumentation map. We don't even deal with endianness (reading an instrumentation map for a 64-bit binary from a 32-bit system -- what order to the bytes come in and how should the 32-bit system interpret those values). These issues start expanding the tooling support matrix. Maybe that's inevitable, depending on which platforms the members of the LLVM community would like to see XRay be available. :D As far as CPU cache hit/misses are concerned, I personally don't think it's that crucial to get the table packed so we utilise the cache more -- the patching code runs through this table sequentially, and the cost is actually in the sys calls making code pages writeable (and marking the page dirty and causing all sorts of more important issues). I think cache hits/misses are the least of the problems here. ;) I am open to suggestions here too, so I'd be happy to shave a few more bytes off if that means the impact of having XRay tables in the binary is minimised. I suppose I should detail a bit more what other things will be coming up, which should help with the overall design direction here. I'll update within the week about some other things we're looking to bring upstream with more details as soon as I'm done fleshing those out. :) Stay tuned! Cheers -- Dean
Serge Rogatch via llvm-dev
2016-Aug-22 11:34 UTC
[llvm-dev] XRay: Demo on x86_64/Linux almost done; some questions.
Hi Dean, Do you have any estimates on when https://reviews.llvm.org/D21982 will reach mainline? (As I understood it's not yet there, looking at http://llvm.org/svn/llvm-project/compiler-rt/trunk ). I would like to test ARM port of XRay, so ready logging would be handful. Thanks, Serge On 8 August 2016 at 17:41, Dean Michael Berris <dean.berris at gmail.com> wrote:> > > On 8 Aug 2016, at 23:27, Serge Rogatch <serge.rogatch at gmail.com> wrote: > > > > I think that 32-bit systems (especially ARM) may be short on memory so > doubling the size of the table containing (potentially) all the functions > may give a tangible overhead. I would even align the entries to 4 bytes (so > 12 bytes per entry) on 32-bit platforms and to 8 bytes (so 24-bytes per > entry) on 64-bit platforms, to improve CPU cache hits. What do you think? > > It should work, but I'm a little wary about painting ourselves into a > corner -- for example, I'm already designing some extensions to this table > to represent other kinds of information (that might fit into what currently > is padding, or use some more bits of the bytes in the entries). I might > need to either have some sort of versioning introduced into this table so > that tools reading the same table (not necessarily the runtime) can > determine what kinds of information will be available in the entries. > Although you're right, maybe 14 bytes of padding is a little excessive but > I'm being very conservative here. :D > > Basically the trade-off is between binary/resident size and tooling > support. > > For 32-bit systems I think its possible to have smaller entries at the > cost of making the tooling and runtime a bit more complex -- i.e. there's > going to be a special implementation for x86 32-bit and 64-bit, arm 32-bit > and 64-bit, etc. Then we think about the tools external to the runtime that > will access the same table. We can probably write tools that extract the > table from binaries in COFF, ELF, and MachO then turn those into a > canonicalised instrumentation map. We don't even deal with endianness > (reading an instrumentation map for a 64-bit binary from a 32-bit system -- > what order to the bytes come in and how should the 32-bit system interpret > those values). These issues start expanding the tooling support matrix. > > Maybe that's inevitable, depending on which platforms the members of the > LLVM community would like to see XRay be available. :D > > As far as CPU cache hit/misses are concerned, I personally don't think > it's that crucial to get the table packed so we utilise the cache more -- > the patching code runs through this table sequentially, and the cost is > actually in the sys calls making code pages writeable (and marking the page > dirty and causing all sorts of more important issues). I think cache > hits/misses are the least of the problems here. ;) > > I am open to suggestions here too, so I'd be happy to shave a few more > bytes off if that means the impact of having XRay tables in the binary is > minimised. > > I suppose I should detail a bit more what other things will be coming up, > which should help with the overall design direction here. I'll update > within the week about some other things we're looking to bring upstream > with more details as soon as I'm done fleshing those out. :) > > Stay tuned! > > Cheers > > -- Dean > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160822/06295c3f/attachment-0001.html>
Apparently Analagous Threads
- XRay: Demo on x86_64/Linux almost done; some questions.
- XRay: Demo on x86_64/Linux almost done; some questions.
- XRay: Demo on x86_64/Linux almost done; some questions.
- XRay: Demo on x86_64/Linux almost done; some questions.
- XRay: Demo on x86_64/Linux almost done; some questions.