thr3ads.net - llvm dev - [llvm-dev] [lldb-dev] RFC: Cleaning up the Itanium demangler [Jun 2017]

If this information is useful, please help other people find it:
Share via:

Jim Ingham via llvm-dev

2017-Jun-22 18:07 UTC

[llvm-dev] [lldb-dev] RFC: Cleaning up the Itanium demangler

This is Greg's area, he'll be able to answer in detail how the name
chopper gets used.  IIRC it chops demangled names, so it is indirectly a client
of the demangler, but it doesn't use the demangler to do this directly. 
Name lookup is done by finding all the base name matches, then comparing the
context.  We don't do a very good job of doing fuzzy full name matches - for
instance when trying to break on one overload you have to get the arguments
exactly as the demangler would produce them.  We could do some more heuristics
here (remove all the spaces you can before comparison, etc.) though it would be
even easier if we had something that could tokenize names - both mangled &
natural.

The Swift demangler produces a node tree for the demangled elements of a name
which is very handy on the Swift side.  A long time ago Greg experimented with
such a thing for the C++ demangler, but it ended up being too slow.

On that note, the demangler is a performance bottleneck for lldb.  Going to the
fast demangler over the system one was a big performance win.  Maybe the system
demangler is fast enough nowadays, but if it isn't then we can't get rid
of the FastDemangler.

Jim
> On Jun 22, 2017, at 8:08 AM, Pavel Labath via lldb-dev <lldb-dev at
lists.llvm.org> wrote:
> 
> On 22 June 2017 at 15:21, Erik Pilkington <erik.pilkington at
gmail.com> wrote:
>> 
>> 
>> 
>> On June 22, 2017 at 5:51:39 AM, Pavel Labath (labath at google.com)
wrote:
>> 
>> I don't have any concrete feedback, but:
>> 
>> - +1 for removing the "FastDemagler"
>> 
>> - If you already construct an AST as a part of your demangling
>> process, would it be possible to export that AST for external
>> consumption somehow? Right now in lldb we sometimes need to parse the
>> demangled name (to get the "basename" of a function for
example), and
>> the code for doing that is quite ugly. It would be much nicer if we
>> could just query the parsed representation of the name somehow, and
>> the AST would enable us to do that.
>> 
>> 
>> I was thinking about this use case a little, actually. I think it makes
more
>> sense to provide a function, say getItaniumDemangledBasename(), which
could
>> just parse and query the AST for the base name (the AST already has an
way
>> of doing this). This would allow the demangler to bail out if it knows
that
>> the rest of the input string isn’t relevant, i.e., we could bail out
after
>> parsing the ‘foo’ in _Z3fooiiiiiii. That, and not having to print out
the
>> AST should make parsing the base name significantly faster on top of
this.
>> 
>> Do you have any other use case for the AST outside of base names? It
still
>> would be possible to export it from ItaniumDemangle.
>> 
> 
> Well.. the current parser chops the name into "basename",
"context",
> "arguments", and "qualifiers" part. All of them seem to
be used right
> now, but I don't know e.g. how unavoidable that is. I know about this
> because I was fixing some bugs there, but I am actually not that
> familiar with this part of LLDB. I am cc-ing lldb-dev if they have any
> thoughts on this. We also have the ability to set breakpoints by
> providing just a part of the context (e.g. "breakpoint set -n
> foo::bar" even though the full function name is baz::booze::foo::bar),
> but this seems to be implemented in some different way.
> 
> I don't think having the ability to short-circuit the demangling would
> bring as any speed benefit, at least not without a major refactor, as
> we demangle all the names anyway. Even the AST solution will probably
> require a fair deal of plumbing on our part to make it useful.
> 
> Also, any custom-tailored solution will probably make it hard to
> retrieve any additional info, should we later need it, so I'd be in
> favor of the AST solution. (I don't know how much it would complicate
> the implementation though).
> _______________________________________________
> lldb-dev mailing list
> lldb-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev

Scott Smith via llvm-dev

2017-Jun-22 18:11 UTC

head link

[llvm-dev] [lldb-dev] RFC: Cleaning up the Itanium demangler

When I looked at demangler performance, I was able to make significant
improvements to the llvm demangler.  At that point removing lldb's fast
demangler didn't hurt performance very much, but the fast demangler was
still faster.  I forget (and apparently didn't write down) how much it
mattered, but post this change I think was single digit %.

https://reviews.llvm.org/D32500


On Thu, Jun 22, 2017 at 11:07 AM, Jim Ingham via lldb-dev <
lldb-dev at lists.llvm.org> wrote:
> This is Greg's area, he'll be able to answer in detail how the name
> chopper gets used.  IIRC it chops demangled names, so it is indirectly a
> client of the demangler, but it doesn't use the demangler to do this
> directly.  Name lookup is done by finding all the base name matches, then
> comparing the context.  We don't do a very good job of doing fuzzy full
> name matches - for instance when trying to break on one overload you have
> to get the arguments exactly as the demangler would produce them.  We could
> do some more heuristics here (remove all the spaces you can before
> comparison, etc.) though it would be even easier if we had something that
> could tokenize names - both mangled & natural.
>
> The Swift demangler produces a node tree for the demangled elements of a
> name which is very handy on the Swift side.  A long time ago Greg
> experimented with such a thing for the C++ demangler, but it ended up being
> too slow.
>
> On that note, the demangler is a performance bottleneck for lldb.  Going
> to the fast demangler over the system one was a big performance win.  Maybe
> the system demangler is fast enough nowadays, but if it isn't then we
can't
> get rid of the FastDemangler.
>
> Jim
>
> > On Jun 22, 2017, at 8:08 AM, Pavel Labath via lldb-dev <
> lldb-dev at lists.llvm.org> wrote:
> >
> > On 22 June 2017 at 15:21, Erik Pilkington <erik.pilkington at
gmail.com>
> wrote:
> >>
> >>
> >>
> >> On June 22, 2017 at 5:51:39 AM, Pavel Labath (labath at
google.com) wrote:
> >>
> >> I don't have any concrete feedback, but:
> >>
> >> - +1 for removing the "FastDemagler"
> >>
> >> - If you already construct an AST as a part of your demangling
> >> process, would it be possible to export that AST for external
> >> consumption somehow? Right now in lldb we sometimes need to parse
the
> >> demangled name (to get the "basename" of a function for
example), and
> >> the code for doing that is quite ugly. It would be much nicer if
we
> >> could just query the parsed representation of the name somehow,
and
> >> the AST would enable us to do that.
> >>
> >>
> >> I was thinking about this use case a little, actually. I think it
makes
> more
> >> sense to provide a function, say getItaniumDemangledBasename(),
which
> could
> >> just parse and query the AST for the base name (the AST already
has an
> way
> >> of doing this). This would allow the demangler to bail out if it
knows
> that
> >> the rest of the input string isn’t relevant, i.e., we could bail
out
> after
> >> parsing the ‘foo’ in _Z3fooiiiiiii. That, and not having to print
out
> the
> >> AST should make parsing the base name significantly faster on top
of
> this.
> >>
> >> Do you have any other use case for the AST outside of base names?
It
> still
> >> would be possible to export it from ItaniumDemangle.
> >>
> >
> > Well.. the current parser chops the name into "basename",
"context",
> > "arguments", and "qualifiers" part. All of them
seem to be used right
> > now, but I don't know e.g. how unavoidable that is. I know about
this
> > because I was fixing some bugs there, but I am actually not that
> > familiar with this part of LLDB. I am cc-ing lldb-dev if they have any
> > thoughts on this. We also have the ability to set breakpoints by
> > providing just a part of the context (e.g. "breakpoint set -n
> > foo::bar" even though the full function name is
baz::booze::foo::bar),
> > but this seems to be implemented in some different way.
> >
> > I don't think having the ability to short-circuit the demangling
would
> > bring as any speed benefit, at least not without a major refactor, as
> > we demangle all the names anyway. Even the AST solution will probably
> > require a fair deal of plumbing on our part to make it useful.
> >
> > Also, any custom-tailored solution will probably make it hard to
> > retrieve any additional info, should we later need it, so I'd be
in
> > favor of the AST solution. (I don't know how much it would
complicate
> > the implementation though).
> > _______________________________________________
> > lldb-dev mailing list
> > lldb-dev at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>
> _______________________________________________
> lldb-dev mailing list
> lldb-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170622/48c29ef3/attachment.html>

Jim Ingham via llvm-dev

2017-Jun-22 20:05 UTC

head link

[llvm-dev] [lldb-dev] RFC: Cleaning up the Itanium demangler

Another important criterium for the demangler in the debugger is that it 100%
cannot crash no matter what it gets fed.  lldb used to have it's own copy of
the system demangler library because it had bugs, and we needed to be able to
fix them faster than the system version.  We feed it all the symbols we ingest
(we actually sniff them a little bit, but we really shouldn't have to do
that, the demangler should be fast enough rejecting symbols) so if there's
one in some system library that triggers a demangler crash, you're pretty
much dead in the water on that system...

Jim

> On Jun 22, 2017, at 11:11 AM, Scott Smith <scott.smith at
purestorage.com> wrote:
> 
> When I looked at demangler performance, I was able to make significant
improvements to the llvm demangler.  At that point removing lldb's fast
demangler didn't hurt performance very much, but the fast demangler was
still faster.  I forget (and apparently didn't write down) how much it
mattered, but post this change I think was single digit %.
> 
> https://reviews.llvm.org/D32500
> 
> 
> On Thu, Jun 22, 2017 at 11:07 AM, Jim Ingham via lldb-dev <lldb-dev at
lists.llvm.org> wrote:
> This is Greg's area, he'll be able to answer in detail how the name
chopper gets used.  IIRC it chops demangled names, so it is indirectly a client
of the demangler, but it doesn't use the demangler to do this directly. 
Name lookup is done by finding all the base name matches, then comparing the
context.  We don't do a very good job of doing fuzzy full name matches - for
instance when trying to break on one overload you have to get the arguments
exactly as the demangler would produce them.  We could do some more heuristics
here (remove all the spaces you can before comparison, etc.) though it would be
even easier if we had something that could tokenize names - both mangled &
natural.
> 
> The Swift demangler produces a node tree for the demangled elements of a
name which is very handy on the Swift side.  A long time ago Greg experimented
with such a thing for the C++ demangler, but it ended up being too slow.
> 
> On that note, the demangler is a performance bottleneck for lldb.  Going to
the fast demangler over the system one was a big performance win.  Maybe the
system demangler is fast enough nowadays, but if it isn't then we can't
get rid of the FastDemangler.
> 
> Jim
> 
> > On Jun 22, 2017, at 8:08 AM, Pavel Labath via lldb-dev <lldb-dev at
lists.llvm.org> wrote:
> >
> > On 22 June 2017 at 15:21, Erik Pilkington <erik.pilkington at
gmail.com> wrote:
> >>
> >>
> >>
> >> On June 22, 2017 at 5:51:39 AM, Pavel Labath (labath at
google.com) wrote:
> >>
> >> I don't have any concrete feedback, but:
> >>
> >> - +1 for removing the "FastDemagler"
> >>
> >> - If you already construct an AST as a part of your demangling
> >> process, would it be possible to export that AST for external
> >> consumption somehow? Right now in lldb we sometimes need to parse
the
> >> demangled name (to get the "basename" of a function for
example), and
> >> the code for doing that is quite ugly. It would be much nicer if
we
> >> could just query the parsed representation of the name somehow,
and
> >> the AST would enable us to do that.
> >>
> >>
> >> I was thinking about this use case a little, actually. I think it
makes more
> >> sense to provide a function, say getItaniumDemangledBasename(),
which could
> >> just parse and query the AST for the base name (the AST already
has an way
> >> of doing this). This would allow the demangler to bail out if it
knows that
> >> the rest of the input string isn’t relevant, i.e., we could bail
out after
> >> parsing the ‘foo’ in _Z3fooiiiiiii. That, and not having to print
out the
> >> AST should make parsing the base name significantly faster on top
of this.
> >>
> >> Do you have any other use case for the AST outside of base names?
It still
> >> would be possible to export it from ItaniumDemangle.
> >>
> >
> > Well.. the current parser chops the name into "basename",
"context",
> > "arguments", and "qualifiers" part. All of them
seem to be used right
> > now, but I don't know e.g. how unavoidable that is. I know about
this
> > because I was fixing some bugs there, but I am actually not that
> > familiar with this part of LLDB. I am cc-ing lldb-dev if they have any
> > thoughts on this. We also have the ability to set breakpoints by
> > providing just a part of the context (e.g. "breakpoint set -n
> > foo::bar" even though the full function name is
baz::booze::foo::bar),
> > but this seems to be implemented in some different way.
> >
> > I don't think having the ability to short-circuit the demangling
would
> > bring as any speed benefit, at least not without a major refactor, as
> > we demangle all the names anyway. Even the AST solution will probably
> > require a fair deal of plumbing on our part to make it useful.
> >
> > Also, any custom-tailored solution will probably make it hard to
> > retrieve any additional info, should we later need it, so I'd be
in
> > favor of the AST solution. (I don't know how much it would
complicate
> > the implementation though).
> > _______________________________________________
> > lldb-dev mailing list
> > lldb-dev at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
> 
> _______________________________________________
> lldb-dev mailing list
> lldb-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>

llvm dev - Jun 2017 - [lldb-dev] RFC: Cleaning up the Itanium demangler

[llvm-dev] [lldb-dev] RFC: Cleaning up the Itanium demangler

[llvm-dev] [lldb-dev] RFC: Cleaning up the Itanium demangler

[llvm-dev] [lldb-dev] RFC: Cleaning up the Itanium demangler