Erik Pilkington via llvm-dev
2017-Jun-22 01:03 UTC
[llvm-dev] RFC: Cleaning up the Itanium demangler
On 6/21/17 5:42 PM, Rui Ueyama wrote:> I'm very interested in your work because I've just started writing a > demangler for the Microsoft mangling scheme. What I found in the > current Itanium demangler is the same as you -- it looks like it > allocates too much memory during parsing and concatenates std::strings > too often. I could see there's a (probably big) room to improve. > Demangler's performance is sometimes important for LLD, which is my > main project, as linkers often have to print out a lot of symbols if a > verbose output is requested. For example, if you link Chrome with the > -map option, the linker has to demangle 300 MiB strings in total, > which currently takes more than 20 seconds on my machine if > single-threaded. > > The way I'm trying to implement a MS demangler is the same as you, > too. I'm trying to create an AST to describe type and then convert it > to string. I guess that we can use the same AST type between Itanium > and MS so that we can use the same code for converting ASTs to strings.Using the same AST is an interesting idea. The AST that I wrote isn't that complicated, and is pretty closely tied to the libcxxabi demangler, so I bet it would be easier to have separate representations, especially if your intending on mimicking the output of MS's demangler. I'm also not at all familiar with how MS mangles their C++, which might imply a slightly different representation.> It's unfortunate that my work is overlapping with yours. Looks like > you are ahead of me, so I'll take a look at your code to see if > there's something I can do for you. > > On Wed, Jun 21, 2017 at 4:42 PM, Erik Pilkington via llvm-dev > <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: > > Hello all, > The itanium demangler in libcxxabi (and also, llvm/lib/Demangle) > is really slow. This is largely because the textual representation > of the symbol that is being demangled is held in a std::string, > and manipulations done during parsing are done on that string. The > demangler is always concatenating strings and inserting into the > middle of strings, which is terrible. The fact that the parsing > logic and the string manipulation/formatting logic is interleaved > also makes the demangler pretty ugly. Another problem was that the > demangler used a lot stack space, and has a bunch of stack > overflows filed against it. > > I've been working on fixing this by parsing first into an AST > structure, and then traversing that AST to produce a demangled > string. This provides a significant performance improvement and > also make the demangler somewhat more clean. Attached you should > find a patch to this effect. This patch is still very much a work > in progress, but currently passes the libcxxabi test suite and > demangles all the symbols in LLVM identically to the current > demangler. It also provides a significant performance improvement: > it demangles the symbols in LLVM about 3.7 times faster than the > current demangler. Also, separating the formatting code from the > parser reduces stack usage (the activation frame for parse_type > reduced from 416 to 144 bytes on my machine). The stack usage is > still pretty bad, but this helps with some of it. > > Does anyone have any early feedback on the patch? Does this seem > like a good direction for the demangler? > > As far as future plans for this file, I have a few more > refactorings and performance improvements that I'd like to get > through. After that, it might be interesting to try to replace the > FastDemangle.cpp demangler in LLDB with this, to restore the one > true demangler in the source tree. The FastDemangler.cpp is only > partially completed, and calls out to ItaniumDemangle.cpp in llvm > (which is a copy of cxa_demangle.cpp) if it fails to parse the symbol. > > Any thoughts here would be appreciated! > Thanks, > Erik > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170621/37742309/attachment.html>
Pavel Labath via llvm-dev
2017-Jun-22 12:51 UTC
[llvm-dev] RFC: Cleaning up the Itanium demangler
I don't have any concrete feedback, but: - +1 for removing the "FastDemagler" - If you already construct an AST as a part of your demangling process, would it be possible to export that AST for external consumption somehow? Right now in lldb we sometimes need to parse the demangled name (to get the "basename" of a function for example), and the code for doing that is quite ugly. It would be much nicer if we could just query the parsed representation of the name somehow, and the AST would enable us to do that. On 22 June 2017 at 02:03, Erik Pilkington via llvm-dev <llvm-dev at lists.llvm.org> wrote:> > > On 6/21/17 5:42 PM, Rui Ueyama wrote: > > I'm very interested in your work because I've just started writing a > demangler for the Microsoft mangling scheme. What I found in the current > Itanium demangler is the same as you -- it looks like it allocates too much > memory during parsing and concatenates std::strings too often. I could see > there's a (probably big) room to improve. Demangler's performance is > sometimes important for LLD, which is my main project, as linkers often have > to print out a lot of symbols if a verbose output is requested. For example, > if you link Chrome with the -map option, the linker has to demangle 300 MiB > strings in total, which currently takes more than 20 seconds on my machine > if single-threaded. > > The way I'm trying to implement a MS demangler is the same as you, too. I'm > trying to create an AST to describe type and then convert it to string. I > guess that we can use the same AST type between Itanium and MS so that we > can use the same code for converting ASTs to strings. > > Using the same AST is an interesting idea. The AST that I wrote isn't that > complicated, and is pretty closely tied to the libcxxabi demangler, so I bet > it would be easier to have separate representations, especially if your > intending on mimicking the output of MS's demangler. I'm also not at all > familiar with how MS mangles their C++, which might imply a slightly > different representation. > > It's unfortunate that my work is overlapping with yours. Looks like you are > ahead of me, so I'll take a look at your code to see if there's something I > can do for you. > > On Wed, Jun 21, 2017 at 4:42 PM, Erik Pilkington via llvm-dev > <llvm-dev at lists.llvm.org> wrote: >> >> Hello all, >> The itanium demangler in libcxxabi (and also, llvm/lib/Demangle) is really >> slow. This is largely because the textual representation of the symbol that >> is being demangled is held in a std::string, and manipulations done during >> parsing are done on that string. The demangler is always concatenating >> strings and inserting into the middle of strings, which is terrible. The >> fact that the parsing logic and the string manipulation/formatting logic is >> interleaved also makes the demangler pretty ugly. Another problem was that >> the demangler used a lot stack space, and has a bunch of stack overflows >> filed against it. >> >> I've been working on fixing this by parsing first into an AST structure, >> and then traversing that AST to produce a demangled string. This provides a >> significant performance improvement and also make the demangler somewhat >> more clean. Attached you should find a patch to this effect. This patch is >> still very much a work in progress, but currently passes the libcxxabi test >> suite and demangles all the symbols in LLVM identically to the current >> demangler. It also provides a significant performance improvement: it >> demangles the symbols in LLVM about 3.7 times faster than the current >> demangler. Also, separating the formatting code from the parser reduces >> stack usage (the activation frame for parse_type reduced from 416 to 144 >> bytes on my machine). The stack usage is still pretty bad, but this helps >> with some of it. >> >> Does anyone have any early feedback on the patch? Does this seem like a >> good direction for the demangler? >> >> As far as future plans for this file, I have a few more refactorings and >> performance improvements that I'd like to get through. After that, it might >> be interesting to try to replace the FastDemangle.cpp demangler in LLDB with >> this, to restore the one true demangler in the source tree. The >> FastDemangler.cpp is only partially completed, and calls out to >> ItaniumDemangle.cpp in llvm (which is a copy of cxa_demangle.cpp) if it >> fails to parse the symbol. >> >> Any thoughts here would be appreciated! >> Thanks, >> Erik >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >
Erik Pilkington via llvm-dev
2017-Jun-22 14:21 UTC
[llvm-dev] RFC: Cleaning up the Itanium demangler
On June 22, 2017 at 5:51:39 AM, Pavel Labath (labath at google.com) wrote: I don't have any concrete feedback, but: - +1 for removing the "FastDemagler" - If you already construct an AST as a part of your demangling process, would it be possible to export that AST for external consumption somehow? Right now in lldb we sometimes need to parse the demangled name (to get the "basename" of a function for example), and the code for doing that is quite ugly. It would be much nicer if we could just query the parsed representation of the name somehow, and the AST would enable us to do that. I was thinking about this use case a little, actually. I think it makes more sense to provide a function, say getItaniumDemangledBasename(), which could just parse and query the AST for the base name (the AST already has an way of doing this). This would allow the demangler to bail out if it knows that the rest of the input string isn’t relevant, i.e., we could bail out after parsing the ‘foo’ in _Z3fooiiiiiii. That, and not having to print out the AST should make parsing the base name significantly faster on top of this. Do you have any other use case for the AST outside of base names? It still would be possible to export it from ItaniumDemangle. On 22 June 2017 at 02:03, Erik Pilkington via llvm-dev <llvm-dev at lists.llvm.org> wrote:> > > On 6/21/17 5:42 PM, Rui Ueyama wrote: > > I'm very interested in your work because I've just started writing a > demangler for the Microsoft mangling scheme. What I found in the current > Itanium demangler is the same as you -- it looks like it allocates toomuch> memory during parsing and concatenates std::strings too often. I couldsee> there's a (probably big) room to improve. Demangler's performance is > sometimes important for LLD, which is my main project, as linkers oftenhave> to print out a lot of symbols if a verbose output is requested. Forexample,> if you link Chrome with the -map option, the linker has to demangle 300MiB> strings in total, which currently takes more than 20 seconds on mymachine> if single-threaded. > > The way I'm trying to implement a MS demangler is the same as you, too.I'm> trying to create an AST to describe type and then convert it to string. I > guess that we can use the same AST type between Itanium and MS so that we > can use the same code for converting ASTs to strings. > > Using the same AST is an interesting idea. The AST that I wrote isn'tthat> complicated, and is pretty closely tied to the libcxxabi demangler, so Ibet> it would be easier to have separate representations, especially if your > intending on mimicking the output of MS's demangler. I'm also not at all > familiar with how MS mangles their C++, which might imply a slightly > different representation. > > It's unfortunate that my work is overlapping with yours. Looks like youare> ahead of me, so I'll take a look at your code to see if there's somethingI> can do for you. > > On Wed, Jun 21, 2017 at 4:42 PM, Erik Pilkington via llvm-dev > <llvm-dev at lists.llvm.org> wrote: >> >> Hello all, >> The itanium demangler in libcxxabi (and also, llvm/lib/Demangle) isreally>> slow. This is largely because the textual representation of the symbolthat>> is being demangled is held in a std::string, and manipulations doneduring>> parsing are done on that string. The demangler is always concatenating >> strings and inserting into the middle of strings, which is terrible. The >> fact that the parsing logic and the string manipulation/formatting logicis>> interleaved also makes the demangler pretty ugly. Another problem wasthat>> the demangler used a lot stack space, and has a bunch of stack overflows >> filed against it. >> >> I've been working on fixing this by parsing first into an AST structure, >> and then traversing that AST to produce a demangled string. Thisprovides a>> significant performance improvement and also make the demangler somewhat >> more clean. Attached you should find a patch to this effect. This patchis>> still very much a work in progress, but currently passes the libcxxabitest>> suite and demangles all the symbols in LLVM identically to the current >> demangler. It also provides a significant performance improvement: it >> demangles the symbols in LLVM about 3.7 times faster than the current >> demangler. Also, separating the formatting code from the parser reduces >> stack usage (the activation frame for parse_type reduced from 416 to 144 >> bytes on my machine). The stack usage is still pretty bad, but thishelps>> with some of it. >> >> Does anyone have any early feedback on the patch? Does this seem like a >> good direction for the demangler? >> >> As far as future plans for this file, I have a few more refactorings and >> performance improvements that I'd like to get through. After that, itmight>> be interesting to try to replace the FastDemangle.cpp demangler in LLDBwith>> this, to restore the one true demangler in the source tree. The >> FastDemangler.cpp is only partially completed, and calls out to >> ItaniumDemangle.cpp in llvm (which is a copy of cxa_demangle.cpp) if it >> fails to parse the symbol. >> >> Any thoughts here would be appreciated! >> Thanks, >> Erik >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170622/86ca0985/attachment.html>
Rui Ueyama via llvm-dev
2017-Jun-22 16:25 UTC
[llvm-dev] RFC: Cleaning up the Itanium demangler
On Wed, Jun 21, 2017 at 6:03 PM, Erik Pilkington <erik.pilkington at gmail.com> wrote:> > > On 6/21/17 5:42 PM, Rui Ueyama wrote: > > I'm very interested in your work because I've just started writing a > demangler for the Microsoft mangling scheme. What I found in the current > Itanium demangler is the same as you -- it looks like it allocates too much > memory during parsing and concatenates std::strings too often. I could see > there's a (probably big) room to improve. Demangler's performance is > sometimes important for LLD, which is my main project, as linkers often > have to print out a lot of symbols if a verbose output is requested. For > example, if you link Chrome with the -map option, the linker has to > demangle 300 MiB strings in total, which currently takes more than 20 > seconds on my machine if single-threaded. > > The way I'm trying to implement a MS demangler is the same as you, too. > I'm trying to create an AST to describe type and then convert it to string. > I guess that we can use the same AST type between Itanium and MS so that we > can use the same code for converting ASTs to strings. > > Using the same AST is an interesting idea. The AST that I wrote isn't that > complicated, and is pretty closely tied to the libcxxabi demangler, so I > bet it would be easier to have separate representations, especially if your > intending on mimicking the output of MS's demangler. I'm also not at all > familiar with how MS mangles their C++, which might imply a slightly > different representation. >I'm not going to immediately try to do it, but I think sharing the same AST data structure seems to makes sense. I'm not too crazy about mimicking all the details of the Microsoft's demangler, so a slight deviation is OK as long as the difference is minor and reasonable. Mangled symbols are very different between Itanium and Microsoft, but after all the demangled form is a plain C++ which should be the (almost) same between the two.> It's unfortunate that my work is overlapping with yours. Looks like you > are ahead of me, so I'll take a look at your code to see if there's > something I can do for you. > > On Wed, Jun 21, 2017 at 4:42 PM, Erik Pilkington via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> Hello all, >> The itanium demangler in libcxxabi (and also, llvm/lib/Demangle) is >> really slow. This is largely because the textual representation of the >> symbol that is being demangled is held in a std::string, and manipulations >> done during parsing are done on that string. The demangler is always >> concatenating strings and inserting into the middle of strings, which is >> terrible. The fact that the parsing logic and the string >> manipulation/formatting logic is interleaved also makes the demangler >> pretty ugly. Another problem was that the demangler used a lot stack space, >> and has a bunch of stack overflows filed against it. >> >> I've been working on fixing this by parsing first into an AST structure, >> and then traversing that AST to produce a demangled string. This provides a >> significant performance improvement and also make the demangler somewhat >> more clean. Attached you should find a patch to this effect. This patch is >> still very much a work in progress, but currently passes the libcxxabi test >> suite and demangles all the symbols in LLVM identically to the current >> demangler. It also provides a significant performance improvement: it >> demangles the symbols in LLVM about 3.7 times faster than the current >> demangler. Also, separating the formatting code from the parser reduces >> stack usage (the activation frame for parse_type reduced from 416 to 144 >> bytes on my machine). The stack usage is still pretty bad, but this helps >> with some of it. >> >> Does anyone have any early feedback on the patch? Does this seem like a >> good direction for the demangler? >> >> As far as future plans for this file, I have a few more refactorings and >> performance improvements that I'd like to get through. After that, it might >> be interesting to try to replace the FastDemangle.cpp demangler in LLDB >> with this, to restore the one true demangler in the source tree. The >> FastDemangler.cpp is only partially completed, and calls out to >> ItaniumDemangle.cpp in llvm (which is a copy of cxa_demangle.cpp) if it >> fails to parse the symbol. >> >> Any thoughts here would be appreciated! >> Thanks, >> Erik >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170622/30851e9e/attachment.html>