thr3ads.net - llvm dev - [llvm-dev] XRay: Demo on x86_64/Linux almost done; some questions. [Aug 2016]

If this information is useful, please help other people find it:
Share via:

Serge Rogatch via llvm-dev

2016-Aug-05 18:06 UTC

[llvm-dev] XRay: Demo on x86_64/Linux almost done; some questions.

Hi Dean,

I have a question for 32-bit platforms. I see in the code that you used the
following in compiler-rt/trunk/lib/xray/xray_interface_internal.h :
struct XRaySledEntry {
  uint64_t Address;
  uint64_t Function;
  unsigned char Kind;
  unsigned char AlwaysInstrument;
  unsigned char Padding[14]; // Need 32 bytes
};

And the peer code in llvm/trunk/lib/Target/X86/X86MCInstLower.cpp :

void X86AsmPrinter::EmitXRayTable() {
  if (Sleds.empty())
    return;
  if (Subtarget->isTargetELF()) {
    auto *Section = OutContext.getELFSection(
        "xray_instr_map", ELF::SHT_PROGBITS,
        ELF::SHF_ALLOC | ELF::SHF_GROUP | ELF::SHF_MERGE, 0,
        CurrentFnSym->getName());
    auto PrevSection = OutStreamer->getCurrentSectionOnly();
    OutStreamer->SwitchSection(Section);
    for (const auto &Sled : Sleds) {
      OutStreamer->EmitSymbolValue(Sled.Sled, 8);
      OutStreamer->EmitSymbolValue(CurrentFnSym, 8);
      auto Kind = static_cast<uint8_t>(Sled.Kind);
      OutStreamer->EmitBytes(
          StringRef(reinterpret_cast<const char *>(&Kind), 1));
      OutStreamer->EmitBytes(
          StringRef(reinterpret_cast<const char
*>(&Sled.AlwaysInstrument),
1));
      OutStreamer->EmitZeros(14);
    }
    OutStreamer->SwitchSection(PrevSection);
  }
  Sleds.clear();
}

So useful part of your entry is 18 bytes, and you add 14 bytes of padding
to achieve 32-byte alignment. But for 32-bit CPUs I understood that
XRaySledEntry::Address and XRaySledEntry::Function can be 32-bit too,
right? So the entry can fit 16 bytes, with 10 useful bytes and 6 bytes of
padding. Can I use 16-byte entries or is there some external (OS? ELF?
Linker?) requirement that one entry must be 32-bytes, or aligned at 32
bytes, etc.?

Thanks,
Serge

On 4 August 2016 at 19:29, Serge Rogatch <serge.rogatch at gmail.com>
wrote:
> Yes, double-checking on the handler side whether its resources are still
> operational, e.g. by calling weak_ptr::lock() may be a better idea than
> elimination of spurious handler calls in XRay at a cost of heavy
> synchronization objects. Ok, let's live with spurious handler calls.
>
> Cheers,
> Serge
>
> On 4 August 2016 at 10:57, Dean Michael Berris <dean.berris at
gmail.com>
> wrote:
>
>>
>> > On 4 Aug 2016, at 06:27, Serge Rogatch <serge.rogatch at
gmail.com> wrote:
>> >
>> > Hi Dean,
>> >
>> > I have a question about the following piece of code in
>> compiler-rt/trunk/lib/xray/xray_trampoline_x86.S :
>> >   movq  _ZN6__xray19XRayPatchedFunctionE(%rip), %rax
>> >   testq  %rax, %rax
>> >   je  .Ltmp0
>> >
>> >   // assume that %r10d has the function id.
>> >   movl  %r10d, %edi
>> >   xor  %esi,%esi
>> >   callq  *%rax
>> > What happens if someone unsets the handler function (i.e. calls
>> __xray_remove_handler() ) or changes the handler (i.e. calls
>> __xray_set_handler() with a different pointer to function) between
"movq
>> _ZN6__xray19XRayPatchedFunctionE(%rip), %rax" and "callq 
*%rax" ? I
>> understood that the old handler will still be called, but isn't it
a
>> problem? What if the code removing the handler starts destructing some
>> objects after that, so that a handler call would result in a crash or
other
>> bad things?
>>
>> That would be unfortunate. :)
>>
>> Yes, you're right, it will be called despite having the handler be
un-set
>> from the global atomic. Though I'd suggest that log handler
implementations
>> really shouldn't be crashing, and should assume that it will be
called at
>> any given time (from multiple threads, so it should also be
thread-safe).
>> Consider also that we're taking a function pointer by design --
assuming
>> that it will be a pointer to a function that will only deal with
globals
>> and the explicitly passed in arguments.
>>
>> An implementer of the log handler should really be responsible for
>> "internal signalling" -- i.e. implementing it so that any
required
>> synchronisation happens internally.
>>
>> Does this help?
>>
>> PS. I think more definitive documentation for this is due, let me think
>> about putting something some documentation on how to use XRay as it is
>> currently implemented once we have the naive in-memory logging
>> implementation land (https://reviews.llvm.org/D21982).
>>
>> Cheers
>>
>> -- Dean
>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160805/bc3f9f0b/attachment.html>

Dean Michael Berris via llvm-dev

2016-Aug-08 03:34 UTC

head link

[llvm-dev] XRay: Demo on x86_64/Linux almost done; some questions.

> On 6 Aug 2016, at 04:06, Serge Rogatch <serge.rogatch at gmail.com>
wrote:
> 
> Hi Dean,
> 
> I have a question for 32-bit platforms. I see in the code that you used the
following in compiler-rt/trunk/lib/xray/xray_interface_internal.h :
> struct XRaySledEntry {
>   uint64_t Address;
>   uint64_t Function;
>   unsigned char Kind;
>   unsigned char AlwaysInstrument;
>   unsigned char Padding[14]; // Need 32 bytes
> };
> 
> And the peer code in llvm/trunk/lib/Target/X86/X86MCInstLower.cpp :
> 
> void X86AsmPrinter::EmitXRayTable() {
>   if (Sleds.empty())
>     return;
>   if (Subtarget->isTargetELF()) {
>     auto *Section = OutContext.getELFSection(
>         "xray_instr_map", ELF::SHT_PROGBITS,
>         ELF::SHF_ALLOC | ELF::SHF_GROUP | ELF::SHF_MERGE, 0,
>         CurrentFnSym->getName());
>     auto PrevSection = OutStreamer->getCurrentSectionOnly();
>     OutStreamer->SwitchSection(Section);
>     for (const auto &Sled : Sleds) {
>       OutStreamer->EmitSymbolValue(Sled.Sled, 8);
>       OutStreamer->EmitSymbolValue(CurrentFnSym, 8);
>       auto Kind = static_cast<uint8_t>(Sled.Kind);
>       OutStreamer->EmitBytes(
>           StringRef(reinterpret_cast<const char *>(&Kind), 1));
>       OutStreamer->EmitBytes(
>           StringRef(reinterpret_cast<const char
*>(&Sled.AlwaysInstrument), 1));
>       OutStreamer->EmitZeros(14);
>     }
>     OutStreamer->SwitchSection(PrevSection);
>   }
>   Sleds.clear();
> }
> 
> So useful part of your entry is 18 bytes, and you add 14 bytes of padding
to achieve 32-byte alignment. But for 32-bit CPUs I understood that
XRaySledEntry::Address and XRaySledEntry::Function can be 32-bit too, right? So
the entry can fit 16 bytes, with 10 useful bytes and 6 bytes of padding. Can I
use 16-byte entries or is there some external (OS? ELF? Linker?) requirement
that one entry must be 32-bytes, or aligned at 32 bytes, etc.?
> 
Good question Serge -- technically there isn't any specific external
requirement here, but that supporting 32-bit x86 isn't a priority for me
right now. I suspect it's possible to support 32-bit x86 with a similar
approach (modifying both the LLVM back-end to emit the right assembly for 32-bit
x86 and maybe for 32-bit non-x86, as well as compiler-rt to work on 32-bit x86)
but that I haven't had the time to explore this yet.

I'm positive that it's doable though and that we know the right places
where the changes have to happen. There's some work being done on the
tooling side of things and I suspect once we have a standardised log format,
things like endianness and sizes of certain values start becoming an issue. For
example, if I build a log analysis tool to work on 64-bit systems, whether it
should be able to handle log files generated in 32-bit systems (and be able to
read differently-sized instrumentation map sections).

For this reason, I think it's better to stay consistent and
forward-compatible (i.e. not have special cases for 32-bit platforms). I do
think it's important to support 32-bit systems too, and I'd be more than
happy to review patches that would make it possible (until say I get the time to
support 32-bit platforms later on).

Does that make sense?

Cheers

-- Dean

Serge Rogatch via llvm-dev

2016-Aug-08 13:27 UTC

head link

[llvm-dev] XRay: Demo on x86_64/Linux almost done; some questions.

I think that 32-bit systems (especially ARM) may be short on memory so
doubling the size of the table containing (potentially) all the functions
may give a tangible overhead. I would even align the entries to 4 bytes (so
12 bytes per entry) on 32-bit platforms and to 8 bytes (so 24-bytes per
entry) on 64-bit platforms, to improve CPU cache hits. What do you think?

Cheers,
Serge

On 8 August 2016 at 06:34, Dean Michael Berris <dean.berris at gmail.com>
wrote:
>
> > On 6 Aug 2016, at 04:06, Serge Rogatch <serge.rogatch at
gmail.com> wrote:
> >
> > Hi Dean,
> >
> > I have a question for 32-bit platforms. I see in the code that you
used
> the following in compiler-rt/trunk/lib/xray/xray_interface_internal.h :
> > struct XRaySledEntry {
> >   uint64_t Address;
> >   uint64_t Function;
> >   unsigned char Kind;
> >   unsigned char AlwaysInstrument;
> >   unsigned char Padding[14]; // Need 32 bytes
> > };
> >
> > And the peer code in llvm/trunk/lib/Target/X86/X86MCInstLower.cpp :
> >
> > void X86AsmPrinter::EmitXRayTable() {
> >   if (Sleds.empty())
> >     return;
> >   if (Subtarget->isTargetELF()) {
> >     auto *Section = OutContext.getELFSection(
> >         "xray_instr_map", ELF::SHT_PROGBITS,
> >         ELF::SHF_ALLOC | ELF::SHF_GROUP | ELF::SHF_MERGE, 0,
> >         CurrentFnSym->getName());
> >     auto PrevSection = OutStreamer->getCurrentSectionOnly();
> >     OutStreamer->SwitchSection(Section);
> >     for (const auto &Sled : Sleds) {
> >       OutStreamer->EmitSymbolValue(Sled.Sled, 8);
> >       OutStreamer->EmitSymbolValue(CurrentFnSym, 8);
> >       auto Kind = static_cast<uint8_t>(Sled.Kind);
> >       OutStreamer->EmitBytes(
> >           StringRef(reinterpret_cast<const char *>(&Kind),
1));
> >       OutStreamer->EmitBytes(
> >           StringRef(reinterpret_cast<const char
> *>(&Sled.AlwaysInstrument), 1));
> >       OutStreamer->EmitZeros(14);
> >     }
> >     OutStreamer->SwitchSection(PrevSection);
> >   }
> >   Sleds.clear();
> > }
> >
> > So useful part of your entry is 18 bytes, and you add 14 bytes of
> padding to achieve 32-byte alignment. But for 32-bit CPUs I understood that
> XRaySledEntry::Address and XRaySledEntry::Function can be 32-bit too,
> right? So the entry can fit 16 bytes, with 10 useful bytes and 6 bytes of
> padding. Can I use 16-byte entries or is there some external (OS? ELF?
> Linker?) requirement that one entry must be 32-bytes, or aligned at 32
> bytes, etc.?
> >
>
> Good question Serge -- technically there isn't any specific external
> requirement here, but that supporting 32-bit x86 isn't a priority for
me
> right now. I suspect it's possible to support 32-bit x86 with a similar
> approach (modifying both the LLVM back-end to emit the right assembly for
> 32-bit x86 and maybe for 32-bit non-x86, as well as compiler-rt to work on
> 32-bit x86) but that I haven't had the time to explore this yet.
>
> I'm positive that it's doable though and that we know the right
places
> where the changes have to happen. There's some work being done on the
> tooling side of things and I suspect once we have a standardised log
> format, things like endianness and sizes of certain values start becoming
> an issue. For example, if I build a log analysis tool to work on 64-bit
> systems, whether it should be able to handle log files generated in 32-bit
> systems (and be able to read differently-sized instrumentation map
> sections).
>
> For this reason, I think it's better to stay consistent and
> forward-compatible (i.e. not have special cases for 32-bit platforms). I do
> think it's important to support 32-bit systems too, and I'd be more
than
> happy to review patches that would make it possible (until say I get the
> time to support 32-bit platforms later on).
>
> Does that make sense?
>
> Cheers
>
> -- Dean
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160808/9be68189/attachment.html>

Possibly Parallel Threads

Search for more maybe matching threads

llvm dev - Aug 2016 - XRay: Demo on x86_64/Linux almost done; some questions.

[llvm-dev] XRay: Demo on x86_64/Linux almost done; some questions.

[llvm-dev] XRay: Demo on x86_64/Linux almost done; some questions.

[llvm-dev] XRay: Demo on x86_64/Linux almost done; some questions.

Possibly Parallel Threads