thr3ads.net - llvm dev - [llvm-dev] Question about opt-report strings [Jan 2020]

If this information is useful, please help other people find it:
Share via:

Kaylor, Andrew via llvm-dev

2020-Jan-06 20:51 UTC

[llvm-dev] Question about opt-report strings

Hi all,

I tried to poke my head into opt-report a while ago and didn't get very far.
Now I'm looking at it again. I'm not sure I understand everything
that's in place so my question here may be misguided.

I'm trying to understand the way strings are handled. When a remark is
emitted, it seems that the string is constructed on the fly based on streaming
inputs. For example,

  ORE->emit([&]() {
    return OptimizationRemark(DEBUG_TYPE, "LoadElim", LI)
           << "load of type " << NV("Type",
LI->getType()) << " eliminated"
           << setExtraArgs() << " in favor of "
           << NV("InfavorOfValue", AvailableValue);
  });

There is some C++ magic going on behind the scenes here, and it makes for a nice
interface, but I'm not clear about what ends up being stored where. I think
within DiagnosticInfoOptimizationBase all the string parts of this get stored in
a vector of name-value pairs with the unnamed strings just having an empty name.
At some point, I guess this gets assembled into a single string? I've also
found references to string tables for the bitstream serializer and a YAML format
that uses a string table, but I'm not clear how and when these are
constructed.

What I'm wondering is whether it would make sense to introduce a sort of
message catalog, similar to the way diagnostics are handled in clang (which I
must admit I also have only a partial understanding of). It seems like the
RemarkName for optimization remarks somewhat serves as a unique identifier (?)
but I would think an integer value of some sort would be better, so maybe
I'm misunderstanding what RemarkName is being used for. I'm imagining
something that would end up looking like this:

  ORE->emit([&]() {
    return OptimizationRemark(DEBUG_TYPE, diag::remark_gvn_load_elim, LI)
           << NV("Type", LI->getType())
           << setExtraArgs() << NV("InfavorOfValue",
AvailableValue);
  });

with a tablegen file somewhere containing this:

def remark_gvn_load_elim: OptRemark<
  "LoadElim",                    // RemarkName (if this is needed for
YAML output or whatever)
  "load of type %0 eliminated",  // Base format string for the remark
(%Type instead of %0 maybe?)
  "in favor of %1">;             // Extra args format string for
verbose output


Has this been discussed before?

Thanks,
Andy

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200106/c748d22b/attachment.html>

Francis Visoiu Mistrih via llvm-dev

2020-Jan-06 22:14 UTC

head link

[llvm-dev] Question about opt-report strings

Hi Andy,
> On Jan 6, 2020, at 12:51 PM, Kaylor, Andrew <andrew.kaylor at
intel.com> wrote:
> 
> Hi all,
>  
> I tried to poke my head into opt-report a while ago and didn’t get very
far. Now I’m looking at it again. I’m not sure I understand everything that’s in
place so my question here may be misguided.
>  
> I’m trying to understand the way strings are handled. When a remark is
emitted, it seems that the string is constructed on the fly based on streaming
inputs. For example,
>  
>   ORE->emit([&]() {
>     return OptimizationRemark(DEBUG_TYPE, "LoadElim", LI)
>            << "load of type " << NV("Type",
LI->getType()) << " eliminated"
>            << setExtraArgs() << " in favor of "
>            << NV("InfavorOfValue", AvailableValue);
>   });
>  
> There is some C++ magic going on behind the scenes here, and it makes for a
nice interface, but I’m not clear about what ends up being stored where. I think
within DiagnosticInfoOptimizationBase all the string parts of this get stored in
a vector of name-value pairs with the unnamed strings just having an empty name.
That’s correct. There is a struct DiagnosticInfoOptimizationBase::Argument that
has a key-value pair and a debug location (used for things like remarks in the
inliner to point to the callee’s source location). Unnamed strings have the key
“String”:
> --- !Passed
> Pass:            gvn
> Name:            LoadElim
> Function:        arg
> Args:
> - String:          'load of type '
> - Type:            i32
> - String:          ' eliminated'
> - String:          ' in favor of '
> - InfavorOfValue:  i
> ...

> At some point, I guess this gets assembled into a single string?
It does in DiagnosticInfoOptimizationBase::getMsg if needed (probably when using
-Rpass?). When it’s serialized to a file, it’s serialized as multiple key-value
“arguments” that can be concatenated later by the client, or consumed based on
the meaning of the key.
> I’ve also found references to string tables for the bitstream serializer
and a YAML format that uses a string table, but I’m not clear how and when these
are constructed.
The serialization part is handled by all the stuff in lib/Remarks.
lib/IR/RemarkStreamer.cpp basically converts LLVM diagnostics
(DiagnosticInfoOptimizationBase) to remarks::Remark objects that are used for
both serializing and deserializing the remarks in all the various formats. The
main reason is to allow any remark producer to be independent from LLVM
diagnostics which are tied to LLVM (M)IR.
When used, the string table is kept in memory until the AsmPrinter, which emits
it in a section in the object file, along with some other metadata. The YAML
format with a string table is usable but was mainly put there to start working
on the whole remark layer before the bitstream-based format was ready. More
details on the various formats here: https://llvm.org/docs/Remarks.html.
>  
> What I’m wondering is whether it would make sense to introduce a sort of
message catalog, similar to the way diagnostics are handled in clang (which I
must admit I also have only a partial understanding of). It seems like the
RemarkName for optimization remarks somewhat serves as a unique identifier (?)
but I would think an integer value of some sort would be better, so maybe I’m
misunderstanding what RemarkName is being used for. I’m imagining something that
would end up looking like this:
I believe the RemarkName + the PassName should be unique, but there is nothing
documenting this as such, nor any checks enforcing it.
>  
>   ORE->emit([&]() {
>     return OptimizationRemark(DEBUG_TYPE, diag::remark_gvn_load_elim, LI)
>            << NV("Type", LI->getType())
>            << setExtraArgs() << NV("InfavorOfValue",
AvailableValue);
>   });
>  
> with a tablegen file somewhere containing this:
>  
> def remark_gvn_load_elim: OptRemark<
>   “LoadElim”,                    // RemarkName (if this is needed for YAML
output or whatever)
>   "load of type %0 eliminated",  // Base format string for the
remark (%Type instead of %0 maybe?)
>   "in favor of %1">;             // Extra args format string
for verbose output
>  
>  
> Has this been discussed before?
This would be great! I was planning on bringing up something like this but never
really got the time to get into it.

I would also add the pass somewhere in the remark definition (although it may be
annoying to keep it updated with every single DEBUG_TYPE).

This will be very useful for documenting all the remarks and to provide a nicer
way of filtering them.

I’d be happy to review this!

Thanks,

— 
Francis
>  
> Thanks,
> Andy

Kaylor, Andrew via llvm-dev

2020-Jan-06 23:11 UTC

head link

[llvm-dev] Question about opt-report strings

Thanks, Francis. I'll try to put something together to get this started.

-Andy

-----Original Message-----
From: Francis Visoiu Mistrih <francisvm at yahoo.com> 
Sent: Monday, January 06, 2020 2:14 PM
To: Kaylor, Andrew <andrew.kaylor at intel.com>
Cc: LLVM Developers Mailing List <llvm-dev at lists.llvm.org>; Adam Nemet
<anemet at apple.com>
Subject: Re: Question about opt-report strings

Hi Andy,
> On Jan 6, 2020, at 12:51 PM, Kaylor, Andrew <andrew.kaylor at
intel.com> wrote:
> 
> Hi all,
>  
> I tried to poke my head into opt-report a while ago and didn’t get very
far. Now I’m looking at it again. I’m not sure I understand everything that’s in
place so my question here may be misguided.
>  
> I’m trying to understand the way strings are handled. When a remark is
emitted, it seems that the string is constructed on the fly based on streaming
inputs. For example,
>  
>   ORE->emit([&]() {
>     return OptimizationRemark(DEBUG_TYPE, "LoadElim", LI)
>            << "load of type " << NV("Type",
LI->getType()) << " eliminated"
>            << setExtraArgs() << " in favor of "
>            << NV("InfavorOfValue", AvailableValue);
>   });
>  
> There is some C++ magic going on behind the scenes here, and it makes for a
nice interface, but I’m not clear about what ends up being stored where. I think
within DiagnosticInfoOptimizationBase all the string parts of this get stored in
a vector of name-value pairs with the unnamed strings just having an empty name.
That’s correct. There is a struct DiagnosticInfoOptimizationBase::Argument that
has a key-value pair and a debug location (used for things like remarks in the
inliner to point to the callee’s source location). Unnamed strings have the key
“String”:
> --- !Passed
> Pass:            gvn
> Name:            LoadElim
> Function:        arg
> Args:
> - String:          'load of type '
> - Type:            i32
> - String:          ' eliminated'
> - String:          ' in favor of '
> - InfavorOfValue:  i
> ...

> At some point, I guess this gets assembled into a single string?
It does in DiagnosticInfoOptimizationBase::getMsg if needed (probably when using
-Rpass?). When it’s serialized to a file, it’s serialized as multiple key-value
“arguments” that can be concatenated later by the client, or consumed based on
the meaning of the key.
> I’ve also found references to string tables for the bitstream serializer
and a YAML format that uses a string table, but I’m not clear how and when these
are constructed.
The serialization part is handled by all the stuff in lib/Remarks.
lib/IR/RemarkStreamer.cpp basically converts LLVM diagnostics
(DiagnosticInfoOptimizationBase) to remarks::Remark objects that are used for
both serializing and deserializing the remarks in all the various formats. The
main reason is to allow any remark producer to be independent from LLVM
diagnostics which are tied to LLVM (M)IR.
When used, the string table is kept in memory until the AsmPrinter, which emits
it in a section in the object file, along with some other metadata. The YAML
format with a string table is usable but was mainly put there to start working
on the whole remark layer before the bitstream-based format was ready. More
details on the various formats here: https://llvm.org/docs/Remarks.html.
>  
> What I’m wondering is whether it would make sense to introduce a sort of
message catalog, similar to the way diagnostics are handled in clang (which I
must admit I also have only a partial understanding of). It seems like the
RemarkName for optimization remarks somewhat serves as a unique identifier (?)
but I would think an integer value of some sort would be better, so maybe I’m
misunderstanding what RemarkName is being used for. I’m imagining something that
would end up looking like this:
I believe the RemarkName + the PassName should be unique, but there is nothing
documenting this as such, nor any checks enforcing it.
>  
>   ORE->emit([&]() {
>     return OptimizationRemark(DEBUG_TYPE, diag::remark_gvn_load_elim, LI)
>            << NV("Type", LI->getType())
>            << setExtraArgs() << NV("InfavorOfValue",
AvailableValue);
>   });
>  
> with a tablegen file somewhere containing this:
>  
> def remark_gvn_load_elim: OptRemark<
>   “LoadElim”,                    // RemarkName (if this is needed for YAML
output or whatever)
>   "load of type %0 eliminated",  // Base format string for the
remark (%Type instead of %0 maybe?)
>   "in favor of %1">;             // Extra args format string
for verbose output
>  
>  
> Has this been discussed before?
This would be great! I was planning on bringing up something like this but never
really got the time to get into it.

I would also add the pass somewhere in the remark definition (although it may be
annoying to keep it updated with every single DEBUG_TYPE).

This will be very useful for documenting all the remarks and to provide a nicer
way of filtering them.

I’d be happy to review this!

Thanks,

— 
Francis
>  
> Thanks,
> Andy

Maybe Matching Threads

Search for more reasonably related threads

llvm dev - Jan 2020 - Question about opt-report strings

[llvm-dev] Question about opt-report strings

[llvm-dev] Question about opt-report strings

[llvm-dev] Question about opt-report strings

Maybe Matching Threads