Serge Rogatch via llvm-dev
2016-Aug-05 18:06 UTC
[llvm-dev] XRay: Demo on x86_64/Linux almost done; some questions.
Hi Dean, I have a question for 32-bit platforms. I see in the code that you used the following in compiler-rt/trunk/lib/xray/xray_interface_internal.h : struct XRaySledEntry { uint64_t Address; uint64_t Function; unsigned char Kind; unsigned char AlwaysInstrument; unsigned char Padding[14]; // Need 32 bytes }; And the peer code in llvm/trunk/lib/Target/X86/X86MCInstLower.cpp : void X86AsmPrinter::EmitXRayTable() { if (Sleds.empty()) return; if (Subtarget->isTargetELF()) { auto *Section = OutContext.getELFSection( "xray_instr_map", ELF::SHT_PROGBITS, ELF::SHF_ALLOC | ELF::SHF_GROUP | ELF::SHF_MERGE, 0, CurrentFnSym->getName()); auto PrevSection = OutStreamer->getCurrentSectionOnly(); OutStreamer->SwitchSection(Section); for (const auto &Sled : Sleds) { OutStreamer->EmitSymbolValue(Sled.Sled, 8); OutStreamer->EmitSymbolValue(CurrentFnSym, 8); auto Kind = static_cast<uint8_t>(Sled.Kind); OutStreamer->EmitBytes( StringRef(reinterpret_cast<const char *>(&Kind), 1)); OutStreamer->EmitBytes( StringRef(reinterpret_cast<const char *>(&Sled.AlwaysInstrument), 1)); OutStreamer->EmitZeros(14); } OutStreamer->SwitchSection(PrevSection); } Sleds.clear(); } So useful part of your entry is 18 bytes, and you add 14 bytes of padding to achieve 32-byte alignment. But for 32-bit CPUs I understood that XRaySledEntry::Address and XRaySledEntry::Function can be 32-bit too, right? So the entry can fit 16 bytes, with 10 useful bytes and 6 bytes of padding. Can I use 16-byte entries or is there some external (OS? ELF? Linker?) requirement that one entry must be 32-bytes, or aligned at 32 bytes, etc.? Thanks, Serge On 4 August 2016 at 19:29, Serge Rogatch <serge.rogatch at gmail.com> wrote:> Yes, double-checking on the handler side whether its resources are still > operational, e.g. by calling weak_ptr::lock() may be a better idea than > elimination of spurious handler calls in XRay at a cost of heavy > synchronization objects. Ok, let's live with spurious handler calls. > > Cheers, > Serge > > On 4 August 2016 at 10:57, Dean Michael Berris <dean.berris at gmail.com> > wrote: > >> >> > On 4 Aug 2016, at 06:27, Serge Rogatch <serge.rogatch at gmail.com> wrote: >> > >> > Hi Dean, >> > >> > I have a question about the following piece of code in >> compiler-rt/trunk/lib/xray/xray_trampoline_x86.S : >> > movq _ZN6__xray19XRayPatchedFunctionE(%rip), %rax >> > testq %rax, %rax >> > je .Ltmp0 >> > >> > // assume that %r10d has the function id. >> > movl %r10d, %edi >> > xor %esi,%esi >> > callq *%rax >> > What happens if someone unsets the handler function (i.e. calls >> __xray_remove_handler() ) or changes the handler (i.e. calls >> __xray_set_handler() with a different pointer to function) between "movq >> _ZN6__xray19XRayPatchedFunctionE(%rip), %rax" and "callq *%rax" ? I >> understood that the old handler will still be called, but isn't it a >> problem? What if the code removing the handler starts destructing some >> objects after that, so that a handler call would result in a crash or other >> bad things? >> >> That would be unfortunate. :) >> >> Yes, you're right, it will be called despite having the handler be un-set >> from the global atomic. Though I'd suggest that log handler implementations >> really shouldn't be crashing, and should assume that it will be called at >> any given time (from multiple threads, so it should also be thread-safe). >> Consider also that we're taking a function pointer by design -- assuming >> that it will be a pointer to a function that will only deal with globals >> and the explicitly passed in arguments. >> >> An implementer of the log handler should really be responsible for >> "internal signalling" -- i.e. implementing it so that any required >> synchronisation happens internally. >> >> Does this help? >> >> PS. I think more definitive documentation for this is due, let me think >> about putting something some documentation on how to use XRay as it is >> currently implemented once we have the naive in-memory logging >> implementation land (https://reviews.llvm.org/D21982). >> >> Cheers >> >> -- Dean >> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160805/bc3f9f0b/attachment.html>
Dean Michael Berris via llvm-dev
2016-Aug-08 03:34 UTC
[llvm-dev] XRay: Demo on x86_64/Linux almost done; some questions.
> On 6 Aug 2016, at 04:06, Serge Rogatch <serge.rogatch at gmail.com> wrote: > > Hi Dean, > > I have a question for 32-bit platforms. I see in the code that you used the following in compiler-rt/trunk/lib/xray/xray_interface_internal.h : > struct XRaySledEntry { > uint64_t Address; > uint64_t Function; > unsigned char Kind; > unsigned char AlwaysInstrument; > unsigned char Padding[14]; // Need 32 bytes > }; > > And the peer code in llvm/trunk/lib/Target/X86/X86MCInstLower.cpp : > > void X86AsmPrinter::EmitXRayTable() { > if (Sleds.empty()) > return; > if (Subtarget->isTargetELF()) { > auto *Section = OutContext.getELFSection( > "xray_instr_map", ELF::SHT_PROGBITS, > ELF::SHF_ALLOC | ELF::SHF_GROUP | ELF::SHF_MERGE, 0, > CurrentFnSym->getName()); > auto PrevSection = OutStreamer->getCurrentSectionOnly(); > OutStreamer->SwitchSection(Section); > for (const auto &Sled : Sleds) { > OutStreamer->EmitSymbolValue(Sled.Sled, 8); > OutStreamer->EmitSymbolValue(CurrentFnSym, 8); > auto Kind = static_cast<uint8_t>(Sled.Kind); > OutStreamer->EmitBytes( > StringRef(reinterpret_cast<const char *>(&Kind), 1)); > OutStreamer->EmitBytes( > StringRef(reinterpret_cast<const char *>(&Sled.AlwaysInstrument), 1)); > OutStreamer->EmitZeros(14); > } > OutStreamer->SwitchSection(PrevSection); > } > Sleds.clear(); > } > > So useful part of your entry is 18 bytes, and you add 14 bytes of padding to achieve 32-byte alignment. But for 32-bit CPUs I understood that XRaySledEntry::Address and XRaySledEntry::Function can be 32-bit too, right? So the entry can fit 16 bytes, with 10 useful bytes and 6 bytes of padding. Can I use 16-byte entries or is there some external (OS? ELF? Linker?) requirement that one entry must be 32-bytes, or aligned at 32 bytes, etc.? >Good question Serge -- technically there isn't any specific external requirement here, but that supporting 32-bit x86 isn't a priority for me right now. I suspect it's possible to support 32-bit x86 with a similar approach (modifying both the LLVM back-end to emit the right assembly for 32-bit x86 and maybe for 32-bit non-x86, as well as compiler-rt to work on 32-bit x86) but that I haven't had the time to explore this yet. I'm positive that it's doable though and that we know the right places where the changes have to happen. There's some work being done on the tooling side of things and I suspect once we have a standardised log format, things like endianness and sizes of certain values start becoming an issue. For example, if I build a log analysis tool to work on 64-bit systems, whether it should be able to handle log files generated in 32-bit systems (and be able to read differently-sized instrumentation map sections). For this reason, I think it's better to stay consistent and forward-compatible (i.e. not have special cases for 32-bit platforms). I do think it's important to support 32-bit systems too, and I'd be more than happy to review patches that would make it possible (until say I get the time to support 32-bit platforms later on). Does that make sense? Cheers -- Dean
Serge Rogatch via llvm-dev
2016-Aug-08 13:27 UTC
[llvm-dev] XRay: Demo on x86_64/Linux almost done; some questions.
I think that 32-bit systems (especially ARM) may be short on memory so doubling the size of the table containing (potentially) all the functions may give a tangible overhead. I would even align the entries to 4 bytes (so 12 bytes per entry) on 32-bit platforms and to 8 bytes (so 24-bytes per entry) on 64-bit platforms, to improve CPU cache hits. What do you think? Cheers, Serge On 8 August 2016 at 06:34, Dean Michael Berris <dean.berris at gmail.com> wrote:> > > On 6 Aug 2016, at 04:06, Serge Rogatch <serge.rogatch at gmail.com> wrote: > > > > Hi Dean, > > > > I have a question for 32-bit platforms. I see in the code that you used > the following in compiler-rt/trunk/lib/xray/xray_interface_internal.h : > > struct XRaySledEntry { > > uint64_t Address; > > uint64_t Function; > > unsigned char Kind; > > unsigned char AlwaysInstrument; > > unsigned char Padding[14]; // Need 32 bytes > > }; > > > > And the peer code in llvm/trunk/lib/Target/X86/X86MCInstLower.cpp : > > > > void X86AsmPrinter::EmitXRayTable() { > > if (Sleds.empty()) > > return; > > if (Subtarget->isTargetELF()) { > > auto *Section = OutContext.getELFSection( > > "xray_instr_map", ELF::SHT_PROGBITS, > > ELF::SHF_ALLOC | ELF::SHF_GROUP | ELF::SHF_MERGE, 0, > > CurrentFnSym->getName()); > > auto PrevSection = OutStreamer->getCurrentSectionOnly(); > > OutStreamer->SwitchSection(Section); > > for (const auto &Sled : Sleds) { > > OutStreamer->EmitSymbolValue(Sled.Sled, 8); > > OutStreamer->EmitSymbolValue(CurrentFnSym, 8); > > auto Kind = static_cast<uint8_t>(Sled.Kind); > > OutStreamer->EmitBytes( > > StringRef(reinterpret_cast<const char *>(&Kind), 1)); > > OutStreamer->EmitBytes( > > StringRef(reinterpret_cast<const char > *>(&Sled.AlwaysInstrument), 1)); > > OutStreamer->EmitZeros(14); > > } > > OutStreamer->SwitchSection(PrevSection); > > } > > Sleds.clear(); > > } > > > > So useful part of your entry is 18 bytes, and you add 14 bytes of > padding to achieve 32-byte alignment. But for 32-bit CPUs I understood that > XRaySledEntry::Address and XRaySledEntry::Function can be 32-bit too, > right? So the entry can fit 16 bytes, with 10 useful bytes and 6 bytes of > padding. Can I use 16-byte entries or is there some external (OS? ELF? > Linker?) requirement that one entry must be 32-bytes, or aligned at 32 > bytes, etc.? > > > > Good question Serge -- technically there isn't any specific external > requirement here, but that supporting 32-bit x86 isn't a priority for me > right now. I suspect it's possible to support 32-bit x86 with a similar > approach (modifying both the LLVM back-end to emit the right assembly for > 32-bit x86 and maybe for 32-bit non-x86, as well as compiler-rt to work on > 32-bit x86) but that I haven't had the time to explore this yet. > > I'm positive that it's doable though and that we know the right places > where the changes have to happen. There's some work being done on the > tooling side of things and I suspect once we have a standardised log > format, things like endianness and sizes of certain values start becoming > an issue. For example, if I build a log analysis tool to work on 64-bit > systems, whether it should be able to handle log files generated in 32-bit > systems (and be able to read differently-sized instrumentation map > sections). > > For this reason, I think it's better to stay consistent and > forward-compatible (i.e. not have special cases for 32-bit platforms). I do > think it's important to support 32-bit systems too, and I'd be more than > happy to review patches that would make it possible (until say I get the > time to support 32-bit platforms later on). > > Does that make sense? > > Cheers > > -- Dean > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160808/9be68189/attachment.html>