thr3ads.net - llvm dev - [llvm-dev] XRay: Demo on x86_64/Linux almost done; some questions. [Aug 2016]

If this information is useful, please help other people find it:
Share via:

Serge Rogatch via llvm-dev

2016-Aug-08 13:27 UTC

[llvm-dev] XRay: Demo on x86_64/Linux almost done; some questions.

I think that 32-bit systems (especially ARM) may be short on memory so
doubling the size of the table containing (potentially) all the functions
may give a tangible overhead. I would even align the entries to 4 bytes (so
12 bytes per entry) on 32-bit platforms and to 8 bytes (so 24-bytes per
entry) on 64-bit platforms, to improve CPU cache hits. What do you think?

Cheers,
Serge

On 8 August 2016 at 06:34, Dean Michael Berris <dean.berris at gmail.com>
wrote:
>
> > On 6 Aug 2016, at 04:06, Serge Rogatch <serge.rogatch at
gmail.com> wrote:
> >
> > Hi Dean,
> >
> > I have a question for 32-bit platforms. I see in the code that you
used
> the following in compiler-rt/trunk/lib/xray/xray_interface_internal.h :
> > struct XRaySledEntry {
> >   uint64_t Address;
> >   uint64_t Function;
> >   unsigned char Kind;
> >   unsigned char AlwaysInstrument;
> >   unsigned char Padding[14]; // Need 32 bytes
> > };
> >
> > And the peer code in llvm/trunk/lib/Target/X86/X86MCInstLower.cpp :
> >
> > void X86AsmPrinter::EmitXRayTable() {
> >   if (Sleds.empty())
> >     return;
> >   if (Subtarget->isTargetELF()) {
> >     auto *Section = OutContext.getELFSection(
> >         "xray_instr_map", ELF::SHT_PROGBITS,
> >         ELF::SHF_ALLOC | ELF::SHF_GROUP | ELF::SHF_MERGE, 0,
> >         CurrentFnSym->getName());
> >     auto PrevSection = OutStreamer->getCurrentSectionOnly();
> >     OutStreamer->SwitchSection(Section);
> >     for (const auto &Sled : Sleds) {
> >       OutStreamer->EmitSymbolValue(Sled.Sled, 8);
> >       OutStreamer->EmitSymbolValue(CurrentFnSym, 8);
> >       auto Kind = static_cast<uint8_t>(Sled.Kind);
> >       OutStreamer->EmitBytes(
> >           StringRef(reinterpret_cast<const char *>(&Kind),
1));
> >       OutStreamer->EmitBytes(
> >           StringRef(reinterpret_cast<const char
> *>(&Sled.AlwaysInstrument), 1));
> >       OutStreamer->EmitZeros(14);
> >     }
> >     OutStreamer->SwitchSection(PrevSection);
> >   }
> >   Sleds.clear();
> > }
> >
> > So useful part of your entry is 18 bytes, and you add 14 bytes of
> padding to achieve 32-byte alignment. But for 32-bit CPUs I understood that
> XRaySledEntry::Address and XRaySledEntry::Function can be 32-bit too,
> right? So the entry can fit 16 bytes, with 10 useful bytes and 6 bytes of
> padding. Can I use 16-byte entries or is there some external (OS? ELF?
> Linker?) requirement that one entry must be 32-bytes, or aligned at 32
> bytes, etc.?
> >
>
> Good question Serge -- technically there isn't any specific external
> requirement here, but that supporting 32-bit x86 isn't a priority for
me
> right now. I suspect it's possible to support 32-bit x86 with a similar
> approach (modifying both the LLVM back-end to emit the right assembly for
> 32-bit x86 and maybe for 32-bit non-x86, as well as compiler-rt to work on
> 32-bit x86) but that I haven't had the time to explore this yet.
>
> I'm positive that it's doable though and that we know the right
places
> where the changes have to happen. There's some work being done on the
> tooling side of things and I suspect once we have a standardised log
> format, things like endianness and sizes of certain values start becoming
> an issue. For example, if I build a log analysis tool to work on 64-bit
> systems, whether it should be able to handle log files generated in 32-bit
> systems (and be able to read differently-sized instrumentation map
> sections).
>
> For this reason, I think it's better to stay consistent and
> forward-compatible (i.e. not have special cases for 32-bit platforms). I do
> think it's important to support 32-bit systems too, and I'd be more
than
> happy to review patches that would make it possible (until say I get the
> time to support 32-bit platforms later on).
>
> Does that make sense?
>
> Cheers
>
> -- Dean
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160808/9be68189/attachment.html>

Dean Michael Berris via llvm-dev

2016-Aug-08 14:41 UTC

head link

[llvm-dev] XRay: Demo on x86_64/Linux almost done; some questions.

> On 8 Aug 2016, at 23:27, Serge Rogatch <serge.rogatch at gmail.com>
wrote:
> 
> I think that 32-bit systems (especially ARM) may be short on memory so
doubling the size of the table containing (potentially) all the functions may
give a tangible overhead. I would even align the entries to 4 bytes (so 12 bytes
per entry) on 32-bit platforms and to 8 bytes (so 24-bytes per entry) on 64-bit
platforms, to improve CPU cache hits. What do you think?
It should work, but I'm a little wary about painting ourselves into a corner
-- for example, I'm already designing some extensions to this table to
represent other kinds of information (that might fit into what currently is
padding, or use some more bits of the bytes in the entries). I might need to
either have some sort of versioning introduced into this table so that tools
reading the same table (not necessarily the runtime) can determine what kinds of
information will be available in the entries. Although you're right, maybe
14 bytes of padding is a little excessive but I'm being very conservative
here. :D

Basically the trade-off is between binary/resident size and tooling support.

For 32-bit systems I think its possible to have smaller entries at the cost of
making the tooling and runtime a bit more complex -- i.e. there's going to
be a special implementation for x86 32-bit and 64-bit, arm 32-bit and 64-bit,
etc. Then we think about the tools external to the runtime that will access the
same table. We can probably write tools that extract the table from binaries in
COFF, ELF, and MachO then turn those into a canonicalised instrumentation map.
We don't even deal with endianness (reading an instrumentation map for a
64-bit binary from a 32-bit system -- what order to the bytes come in and how
should the 32-bit system interpret those values). These issues start expanding
the tooling support matrix.

Maybe that's inevitable, depending on which platforms the members of the
LLVM community would like to see XRay be available. :D

As far as CPU cache hit/misses are concerned, I personally don't think
it's that crucial to get the table packed so we utilise the cache more --
the patching code runs through this table sequentially, and the cost is actually
in the sys calls making code pages writeable (and marking the page dirty and
causing all sorts of more important issues). I think cache hits/misses are the
least of the problems here. ;)

I am open to suggestions here too, so I'd be happy to shave a few more bytes
off if that means the impact of having XRay tables in the binary is minimised.

I suppose I should detail a bit more what other things will be coming up, which
should help with the overall design direction here. I'll update within the
week about some other things we're looking to bring upstream with more
details as soon as I'm done fleshing those out. :)

Stay tuned!

Cheers

-- Dean

Serge Rogatch via llvm-dev

2016-Aug-22 11:34 UTC

head link

[llvm-dev] XRay: Demo on x86_64/Linux almost done; some questions.

Hi Dean,

Do you have any estimates on when https://reviews.llvm.org/D21982 will
reach mainline? (As I understood it's not yet there, looking at
http://llvm.org/svn/llvm-project/compiler-rt/trunk ). I would like to test
ARM port of XRay, so ready logging would be handful.

Thanks,
Serge

On 8 August 2016 at 17:41, Dean Michael Berris <dean.berris at gmail.com>
wrote:
>
> > On 8 Aug 2016, at 23:27, Serge Rogatch <serge.rogatch at
gmail.com> wrote:
> >
> > I think that 32-bit systems (especially ARM) may be short on memory so
> doubling the size of the table containing (potentially) all the functions
> may give a tangible overhead. I would even align the entries to 4 bytes (so
> 12 bytes per entry) on 32-bit platforms and to 8 bytes (so 24-bytes per
> entry) on 64-bit platforms, to improve CPU cache hits. What do you think?
>
> It should work, but I'm a little wary about painting ourselves into a
> corner -- for example, I'm already designing some extensions to this
table
> to represent other kinds of information (that might fit into what currently
> is padding, or use some more bits of the bytes in the entries). I might
> need to either have some sort of versioning introduced into this table so
> that tools reading the same table (not necessarily the runtime) can
> determine what kinds of information will be available in the entries.
> Although you're right, maybe 14 bytes of padding is a little excessive
but
> I'm being very conservative here. :D
>
> Basically the trade-off is between binary/resident size and tooling
> support.
>
> For 32-bit systems I think its possible to have smaller entries at the
> cost of making the tooling and runtime a bit more complex -- i.e.
there's
> going to be a special implementation for x86 32-bit and 64-bit, arm 32-bit
> and 64-bit, etc. Then we think about the tools external to the runtime that
> will access the same table. We can probably write tools that extract the
> table from binaries in COFF, ELF, and MachO then turn those into a
> canonicalised instrumentation map. We don't even deal with endianness
> (reading an instrumentation map for a 64-bit binary from a 32-bit system --
> what order to the bytes come in and how should the 32-bit system interpret
> those values). These issues start expanding the tooling support matrix.
>
> Maybe that's inevitable, depending on which platforms the members of
the
> LLVM community would like to see XRay be available. :D
>
> As far as CPU cache hit/misses are concerned, I personally don't think
> it's that crucial to get the table packed so we utilise the cache more
--
> the patching code runs through this table sequentially, and the cost is
> actually in the sys calls making code pages writeable (and marking the page
> dirty and causing all sorts of more important issues). I think cache
> hits/misses are the least of the problems here. ;)
>
> I am open to suggestions here too, so I'd be happy to shave a few more
> bytes off if that means the impact of having XRay tables in the binary is
> minimised.
>
> I suppose I should detail a bit more what other things will be coming up,
> which should help with the overall design direction here. I'll update
> within the week about some other things we're looking to bring upstream
> with more details as soon as I'm done fleshing those out. :)
>
> Stay tuned!
>
> Cheers
>
> -- Dean
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160822/06295c3f/attachment-0001.html>

Apparently Analagous Threads

Search for more maybe matching threads

llvm dev - Aug 2016 - XRay: Demo on x86_64/Linux almost done; some questions.

[llvm-dev] XRay: Demo on x86_64/Linux almost done; some questions.

[llvm-dev] XRay: Demo on x86_64/Linux almost done; some questions.

[llvm-dev] XRay: Demo on x86_64/Linux almost done; some questions.

Apparently Analagous Threads