thr3ads.net - llvm dev - [llvm-dev] [lldb-dev] RFC: Cleaning up the Itanium demangler [Jun 2017]

If this information is useful, please help other people find it:
Share via:

Erik Pilkington via llvm-dev

2017-Jun-22 14:21 UTC

[llvm-dev] RFC: Cleaning up the Itanium demangler

On June 22, 2017 at 5:51:39 AM, Pavel Labath (labath at google.com) wrote:

I don't have any concrete feedback, but:

- +1 for removing the "FastDemagler"

- If you already construct an AST as a part of your demangling
process, would it be possible to export that AST for external
consumption somehow? Right now in lldb we sometimes need to parse the
demangled name (to get the "basename" of a function for example), and
the code for doing that is quite ugly. It would be much nicer if we
could just query the parsed representation of the name somehow, and
the AST would enable us to do that.


I was thinking about this use case a little, actually. I think it makes
more sense to provide a function, say getItaniumDemangledBasename(), which
could just parse and query the AST for the base name (the AST already has
an way of doing this). This would allow the demangler to bail out if it
knows that the rest of the input string isn’t relevant, i.e., we could bail
out after parsing the ‘foo’ in _Z3fooiiiiiii. That, and not having to print
out the AST should make parsing the base name significantly faster on top
of this.

Do you have any other use case for the AST outside of base names? It still
would be possible to export it from ItaniumDemangle.


On 22 June 2017 at 02:03, Erik Pilkington via llvm-dev
<llvm-dev at lists.llvm.org> wrote:>
>
> On 6/21/17 5:42 PM, Rui Ueyama wrote:
>
> I'm very interested in your work because I've just started writing
a
> demangler for the Microsoft mangling scheme. What I found in the current
> Itanium demangler is the same as you -- it looks like it allocates too
much> memory during parsing and concatenates std::strings too often. I could
see> there's a (probably big) room to improve. Demangler's performance
is
> sometimes important for LLD, which is my main project, as linkers often
have> to print out a lot of symbols if a verbose output is requested. For
example,> if you link Chrome with the -map option, the linker has to demangle 300
MiB> strings in total, which currently takes more than 20 seconds on my
machine> if single-threaded.
>
> The way I'm trying to implement a MS demangler is the same as you, too.
I'm> trying to create an AST to describe type and then convert it to string. I
> guess that we can use the same AST type between Itanium and MS so that we
> can use the same code for converting ASTs to strings.
>
> Using the same AST is an interesting idea. The AST that I wrote isn't
that> complicated, and is pretty closely tied to the libcxxabi demangler, so I
bet> it would be easier to have separate representations, especially if your
> intending on mimicking the output of MS's demangler. I'm also not
at all
> familiar with how MS mangles their C++, which might imply a slightly
> different representation.
>
> It's unfortunate that my work is overlapping with yours. Looks like you
are> ahead of me, so I'll take a look at your code to see if there's
something
I> can do for you.
>
> On Wed, Jun 21, 2017 at 4:42 PM, Erik Pilkington via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>>
>> Hello all,
>> The itanium demangler in libcxxabi (and also, llvm/lib/Demangle) is
really>> slow. This is largely because the textual representation of the symbol
that>> is being demangled is held in a std::string, and manipulations done
during>> parsing are done on that string. The demangler is always concatenating
>> strings and inserting into the middle of strings, which is terrible.
The
>> fact that the parsing logic and the string manipulation/formatting
logic
is>> interleaved also makes the demangler pretty ugly. Another problem was
that>> the demangler used a lot stack space, and has a bunch of stack
overflows
>> filed against it.
>>
>> I've been working on fixing this by parsing first into an AST
structure,
>> and then traversing that AST to produce a demangled string. This
provides a>> significant performance improvement and also make the demangler
somewhat
>> more clean. Attached you should find a patch to this effect. This patch
is>> still very much a work in progress, but currently passes the libcxxabi
test>> suite and demangles all the symbols in LLVM identically to the current
>> demangler. It also provides a significant performance improvement: it
>> demangles the symbols in LLVM about 3.7 times faster than the current
>> demangler. Also, separating the formatting code from the parser reduces
>> stack usage (the activation frame for parse_type reduced from 416 to
144
>> bytes on my machine). The stack usage is still pretty bad, but this
helps>> with some of it.
>>
>> Does anyone have any early feedback on the patch? Does this seem like a
>> good direction for the demangler?
>>
>> As far as future plans for this file, I have a few more refactorings
and
>> performance improvements that I'd like to get through. After that,
it
might>> be interesting to try to replace the FastDemangle.cpp demangler in LLDB
with>> this, to restore the one true demangler in the source tree. The
>> FastDemangler.cpp is only partially completed, and calls out to
>> ItaniumDemangle.cpp in llvm (which is a copy of cxa_demangle.cpp) if it
>> fails to parse the symbol.
>>
>> Any thoughts here would be appreciated!
>> Thanks,
>> Erik
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170622/86ca0985/attachment.html>

Pavel Labath via llvm-dev

2017-Jun-22 15:08 UTC

head link

[llvm-dev] RFC: Cleaning up the Itanium demangler

On 22 June 2017 at 15:21, Erik Pilkington <erik.pilkington at gmail.com>
wrote:>
>
>
> On June 22, 2017 at 5:51:39 AM, Pavel Labath (labath at google.com) wrote:
>
> I don't have any concrete feedback, but:
>
> - +1 for removing the "FastDemagler"
>
> - If you already construct an AST as a part of your demangling
> process, would it be possible to export that AST for external
> consumption somehow? Right now in lldb we sometimes need to parse the
> demangled name (to get the "basename" of a function for example),
and
> the code for doing that is quite ugly. It would be much nicer if we
> could just query the parsed representation of the name somehow, and
> the AST would enable us to do that.
>
>
> I was thinking about this use case a little, actually. I think it makes
more
> sense to provide a function, say getItaniumDemangledBasename(), which could
> just parse and query the AST for the base name (the AST already has an way
> of doing this). This would allow the demangler to bail out if it knows that
> the rest of the input string isn’t relevant, i.e., we could bail out after
> parsing the ‘foo’ in _Z3fooiiiiiii. That, and not having to print out the
> AST should make parsing the base name significantly faster on top of this.
>
> Do you have any other use case for the AST outside of base names? It still
> would be possible to export it from ItaniumDemangle.
>
Well.. the current parser chops the name into "basename",
"context",
"arguments", and "qualifiers" part. All of them seem to be
used right
now, but I don't know e.g. how unavoidable that is. I know about this
because I was fixing some bugs there, but I am actually not that
familiar with this part of LLDB. I am cc-ing lldb-dev if they have any
thoughts on this. We also have the ability to set breakpoints by
providing just a part of the context (e.g. "breakpoint set -n
foo::bar" even though the full function name is baz::booze::foo::bar),
but this seems to be implemented in some different way.

I don't think having the ability to short-circuit the demangling would
bring as any speed benefit, at least not without a major refactor, as
we demangle all the names anyway. Even the AST solution will probably
require a fair deal of plumbing on our part to make it useful.

Also, any custom-tailored solution will probably make it hard to
retrieve any additional info, should we later need it, so I'd be in
favor of the AST solution. (I don't know how much it would complicate
the implementation though).

Erik Pilkington via llvm-dev

2017-Jun-22 15:50 UTC

head link

[llvm-dev] RFC: Cleaning up the Itanium demangler

On Thu, Jun 22, 2017 at 8:08 AM, Pavel Labath <labath at google.com>
wrote:
> On 22 June 2017 at 15:21, Erik Pilkington <erik.pilkington at
gmail.com>
> wrote:
> >
> >
> >
> > On June 22, 2017 at 5:51:39 AM, Pavel Labath (labath at google.com)
wrote:
> >
> > I don't have any concrete feedback, but:
> >
> > - +1 for removing the "FastDemagler"
> >
> > - If you already construct an AST as a part of your demangling
> > process, would it be possible to export that AST for external
> > consumption somehow? Right now in lldb we sometimes need to parse the
> > demangled name (to get the "basename" of a function for
example), and
> > the code for doing that is quite ugly. It would be much nicer if we
> > could just query the parsed representation of the name somehow, and
> > the AST would enable us to do that.
> >
> >
> > I was thinking about this use case a little, actually. I think it
makes
> more
> > sense to provide a function, say getItaniumDemangledBasename(), which
> could
> > just parse and query the AST for the base name (the AST already has an
> way
> > of doing this). This would allow the demangler to bail out if it knows
> that
> > the rest of the input string isn’t relevant, i.e., we could bail out
> after
> > parsing the ‘foo’ in _Z3fooiiiiiii. That, and not having to print out
the
> > AST should make parsing the base name significantly faster on top of
> this.
> >
> > Do you have any other use case for the AST outside of base names? It
> still
> > would be possible to export it from ItaniumDemangle.
> >
>
> Well.. the current parser chops the name into "basename",
"context",
> "arguments", and "qualifiers" part. All of them seem to
be used right
> now, but I don't know e.g. how unavoidable that is. I know about this
> because I was fixing some bugs there, but I am actually not that
> familiar with this part of LLDB. I am cc-ing lldb-dev if they have any
> thoughts on this. We also have the ability to set breakpoints by
> providing just a part of the context (e.g. "breakpoint set -n
> foo::bar" even though the full function name is baz::booze::foo::bar),
> but this seems to be implemented in some different way.
>
> I don't think having the ability to short-circuit the demangling would
> bring as any speed benefit, at least not without a major refactor, as
> we demangle all the names anyway. Even the AST solution will probably
> require a fair deal of plumbing on our part to make it useful.
>
> Also, any custom-tailored solution will probably make it hard to
> retrieve any additional info, should we later need it, so I'd be in
> favor of the AST solution. (I don't know how much it would complicate
> the implementation though).
>
Ah, I see. In that case I agree that exposing the AST is the only way that
this could be done. I don't think it would be that hard to implement, it
would cause a bit of a divergence between cxa_demangle and ItaniumDemangle,
where the former would want to keep the AST private and the latter public,
but thats not the end of the world. I'd be curious to see if the LLDB folks
are interested in such an API.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170622/d29fc61a/attachment.html>

Jim Ingham via llvm-dev

2017-Jun-22 18:07 UTC

head link

[llvm-dev] [lldb-dev] RFC: Cleaning up the Itanium demangler

This is Greg's area, he'll be able to answer in detail how the name
chopper gets used.  IIRC it chops demangled names, so it is indirectly a client
of the demangler, but it doesn't use the demangler to do this directly. 
Name lookup is done by finding all the base name matches, then comparing the
context.  We don't do a very good job of doing fuzzy full name matches - for
instance when trying to break on one overload you have to get the arguments
exactly as the demangler would produce them.  We could do some more heuristics
here (remove all the spaces you can before comparison, etc.) though it would be
even easier if we had something that could tokenize names - both mangled &
natural.

The Swift demangler produces a node tree for the demangled elements of a name
which is very handy on the Swift side.  A long time ago Greg experimented with
such a thing for the C++ demangler, but it ended up being too slow.

On that note, the demangler is a performance bottleneck for lldb.  Going to the
fast demangler over the system one was a big performance win.  Maybe the system
demangler is fast enough nowadays, but if it isn't then we can't get rid
of the FastDemangler.

Jim
> On Jun 22, 2017, at 8:08 AM, Pavel Labath via lldb-dev <lldb-dev at
lists.llvm.org> wrote:
> 
> On 22 June 2017 at 15:21, Erik Pilkington <erik.pilkington at
gmail.com> wrote:
>> 
>> 
>> 
>> On June 22, 2017 at 5:51:39 AM, Pavel Labath (labath at google.com)
wrote:
>> 
>> I don't have any concrete feedback, but:
>> 
>> - +1 for removing the "FastDemagler"
>> 
>> - If you already construct an AST as a part of your demangling
>> process, would it be possible to export that AST for external
>> consumption somehow? Right now in lldb we sometimes need to parse the
>> demangled name (to get the "basename" of a function for
example), and
>> the code for doing that is quite ugly. It would be much nicer if we
>> could just query the parsed representation of the name somehow, and
>> the AST would enable us to do that.
>> 
>> 
>> I was thinking about this use case a little, actually. I think it makes
more
>> sense to provide a function, say getItaniumDemangledBasename(), which
could
>> just parse and query the AST for the base name (the AST already has an
way
>> of doing this). This would allow the demangler to bail out if it knows
that
>> the rest of the input string isn’t relevant, i.e., we could bail out
after
>> parsing the ‘foo’ in _Z3fooiiiiiii. That, and not having to print out
the
>> AST should make parsing the base name significantly faster on top of
this.
>> 
>> Do you have any other use case for the AST outside of base names? It
still
>> would be possible to export it from ItaniumDemangle.
>> 
> 
> Well.. the current parser chops the name into "basename",
"context",
> "arguments", and "qualifiers" part. All of them seem to
be used right
> now, but I don't know e.g. how unavoidable that is. I know about this
> because I was fixing some bugs there, but I am actually not that
> familiar with this part of LLDB. I am cc-ing lldb-dev if they have any
> thoughts on this. We also have the ability to set breakpoints by
> providing just a part of the context (e.g. "breakpoint set -n
> foo::bar" even though the full function name is baz::booze::foo::bar),
> but this seems to be implemented in some different way.
> 
> I don't think having the ability to short-circuit the demangling would
> bring as any speed benefit, at least not without a major refactor, as
> we demangle all the names anyway. Even the AST solution will probably
> require a fair deal of plumbing on our part to make it useful.
> 
> Also, any custom-tailored solution will probably make it hard to
> retrieve any additional info, should we later need it, so I'd be in
> favor of the AST solution. (I don't know how much it would complicate
> the implementation though).
> _______________________________________________
> lldb-dev mailing list
> lldb-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev

Seemingly Similar Threads

Search for more seemingly similar threads

llvm dev - Jun 2017 - [lldb-dev] RFC: Cleaning up the Itanium demangler

[llvm-dev] RFC: Cleaning up the Itanium demangler

[llvm-dev] RFC: Cleaning up the Itanium demangler

[llvm-dev] RFC: Cleaning up the Itanium demangler

[llvm-dev] [lldb-dev] RFC: Cleaning up the Itanium demangler

Seemingly Similar Threads