Shankar Easwaran
2014-Apr-02 18:02 UTC
[LLVMdev] [lld] adding demangler for symbol resolution
On 4/2/2014 12:23 PM, Nick Kledzik wrote:> On Apr 1, 2014, at 9:19 PM, Shankar Easwaran wrote: > >> Hi Nick, Bigcheese, >> >> When lld is used to link C++ code, it would be required to demangle symbol names by default/user driven option. >> >> The Gnu linker has the following options :- >> >> --demangle=[style] >> --no-demangle >> >> I found that clang/llvm-symbolizer use __cxx_demangle function. >> >> I would think that lld also need to call the same function, and I think the way we want to demangle is to have the function in LinkingContext as various flavors may choose to use different API's to demangle symbol names. >> >> The API's that would be in LinkingContext would be :- >> >> * virtual bool canDemangle() = 0; // Does the flavor provide a way to demangle symbol names ? >> * virtual std::string demangle(StringRef symbolName) = 0; // demangle the symbol name >> >> Thoughts / Suggestions ? > Wouldn't it be simpler to have one demangle() method that does nothing (returns input string) if demangling is not available, the string is not a mangled symbol, or demangling was turned off (--no-demangle). Then, you just wrap a demangle() call around every use.Are you mentioning that one demangle function in LinkingContext ? One demangle method wouldnt work as the ItaniumABI uses one method to demangle, ARMCXXABI uses a different method, and MSVC uses a different one. I am not sure about Mach-O here ?> The __cxa_demangle function has an odd interface that requires a malloc allocated block. Having demangle() return a std::string means yet another allocation. We might not care if this is just used in diagnostic outputs, but a more efficient way would be to pass the stream object to demangle and have it write directly to the stream instead of creating a std::string.I dont know if diagnostics in clang, already redirect things directly to a stream. May be for now, as an initial implementation, we can have a single demangle function that returns a std::string. As part of this, I was thinking to cleanup the way the errors are displayed to the user from the Resolver, we could have functions in SymbolTable with raiseError(SymbolErrorKind, filename, symbolname) raiseError(SymbolErrorKind, filename, symbolname, filename, symbolname) SymbolErrorKind : MultipleDefinition Undefined GroupError Note (for tracing) ...> Seems like a demangling utility might be something to add at the LLVM level. Either directly to raw_ostream or a wrapper like format().I have browsed discussions in llvm related to this to move the demangler function which is housed in libcxx, and I dont think there is a plan to move that. I think the format() specifier would be one thing that would be useful, but I am not sure on how different linking contexts in lld, could route calls with a central format specifier. Can you share more info on this ? Thanks Shankar Easwaran -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by the Linux Foundation
On Wed, Apr 2, 2014 at 11:02 AM, Shankar Easwaran <shankare at codeaurora.org>wrote:> On 4/2/2014 12:23 PM, Nick Kledzik wrote: > >> On Apr 1, 2014, at 9:19 PM, Shankar Easwaran wrote: >> >> Hi Nick, Bigcheese, >>> >>> When lld is used to link C++ code, it would be required to demangle >>> symbol names by default/user driven option. >>> >>> The Gnu linker has the following options :- >>> >>> --demangle=[style] >>> --no-demangle >>> >>> I found that clang/llvm-symbolizer use __cxx_demangle function. >>> >>> I would think that lld also need to call the same function, and I think >>> the way we want to demangle is to have the function in LinkingContext as >>> various flavors may choose to use different API's to demangle symbol names. >>> >>> The API's that would be in LinkingContext would be :- >>> >>> * virtual bool canDemangle() = 0; // Does the flavor provide a >>> way to demangle symbol names ? >>> * virtual std::string demangle(StringRef symbolName) = 0; // >>> demangle the symbol name >>> >>> Thoughts / Suggestions ? >>> >> Wouldn't it be simpler to have one demangle() method that does nothing >> (returns input string) if demangling is not available, the string is not a >> mangled symbol, or demangling was turned off (--no-demangle). Then, you >> just wrap a demangle() call around every use. >> > Are you mentioning that one demangle function in LinkingContext ? > > One demangle method wouldnt work as the ItaniumABI uses one method to > demangle, ARMCXXABI uses a different method, and MSVC uses a different one. > I am not sure about Mach-O here ? > > > The __cxa_demangle function has an odd interface that requires a malloc >> allocated block. Having demangle() return a std::string means yet another >> allocation. We might not care if this is just used in diagnostic outputs, >> but a more efficient way would be to pass the stream object to demangle and >> have it write directly to the stream instead of creating a std::string. >> > I dont know if diagnostics in clang, already redirect things directly to a > stream. > > May be for now, as an initial implementation, we can have a single > demangle function that returns a std::string. > > As part of this, I was thinking to cleanup the way the errors are > displayed to the user from the Resolver, we could have functions in > SymbolTable with > > raiseError(SymbolErrorKind, filename, symbolname) > raiseError(SymbolErrorKind, filename, symbolname, filename, symbolname) > > SymbolErrorKind :> > MultipleDefinition > Undefined > GroupError > Note (for tracing) > ...I'd think error message outputs are a bit scattered in SymbolTable.cpp, but defining enum values for it is too much. Let's not design too much. I'd define a function for each error if there are multiple locations printing the same error. Also this is a separate issue from demangling so we shouldn't mix them.> > > Seems like a demangling utility might be something to add at the LLVM >> level. Either directly to raw_ostream or a wrapper like format(). >> > I have browsed discussions in llvm related to this to move the demangler > function which is housed in libcxx, and I dont think there is a plan to > move that. > > I think the format() specifier would be one thing that would be useful, > but I am not sure on how different linking contexts in lld, could route > calls with a central format specifier. > > Can you share more info on this ? > > > Thanks > > Shankar Easwaran > > -- > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted > by the Linux Foundation > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140402/6ea5aaed/attachment.html>
Reid Kleckner
2014-Apr-02 18:23 UTC
[LLVMdev] [lld] adding demangler for symbol resolution
On Wed, Apr 2, 2014 at 11:02 AM, Shankar Easwaran <shankare at codeaurora.org>wrote:> On 4/2/2014 12:23 PM, Nick Kledzik wrote: > >> On Apr 1, 2014, at 9:19 PM, Shankar Easwaran wrote: >> >> Hi Nick, Bigcheese, >>> >>> When lld is used to link C++ code, it would be required to demangle >>> symbol names by default/user driven option. >>> >>> The Gnu linker has the following options :- >>> >>> --demangle=[style] >>> --no-demangle >>> >>> I found that clang/llvm-symbolizer use __cxx_demangle function. >>> >>> I would think that lld also need to call the same function, and I think >>> the way we want to demangle is to have the function in LinkingContext as >>> various flavors may choose to use different API's to demangle symbol names. >>> >>> The API's that would be in LinkingContext would be :- >>> >>> * virtual bool canDemangle() = 0; // Does the flavor provide a >>> way to demangle symbol names ? >>> * virtual std::string demangle(StringRef symbolName) = 0; // >>> demangle the symbol name >>> Thoughts / Suggestions ? >>> >> Wouldn't it be simpler to have one demangle() method that does nothing >> (returns input string) if demangling is not available, the string is not a >> mangled symbol, or demangling was turned off (--no-demangle). Then, you >> just wrap a demangle() call around every use. >> > Are you mentioning that one demangle function in LinkingContext ? > > One demangle method wouldnt work as the ItaniumABI uses one method to > demangle, ARMCXXABI uses a different method, and MSVC uses a different one. > I am not sure about Mach-O here ?First, it's really easy to detect which ABI is being used based on the prefix: _Z -> standard Itanium demangler (__cxa_demangle) __Z -> Itanium with a leading _ ? -> MSVC We don't need a virtual method. MinGW people might be linking Itanium symbols on Windows, and that should demangle just fine if __cxa_demangle is available. Second, __cxa_demangle is not available on all platforms, so lld should just test for it's availability and use it for Itanium symbols if available. I think the LLVM project has a demangler floating around (libc++?). It might be nice to find a way to reuse that across projects like this so the output of LLVM tools doesn't change based on the capabilities of the host. In other words, it'd be nice if we had a good story for demangled diagnostics while cross-linking. The __cxa_demangle function has an odd interface that requires a malloc>> allocated block. Having demangle() return a std::string means yet another >> allocation. We might not care if this is just used in diagnostic outputs, >> but a more efficient way would be to pass the stream object to demangle and >> have it write directly to the stream instead of creating a std::string. >> > I dont know if diagnostics in clang, already redirect things directly to a > stream. > > May be for now, as an initial implementation, we can have a single > demangle function that returns a std::string. > > As part of this, I was thinking to cleanup the way the errors are > displayed to the user from the Resolver, we could have functions in > SymbolTable with > > raiseError(SymbolErrorKind, filename, symbolname) > raiseError(SymbolErrorKind, filename, symbolname, filename, symbolname) > > SymbolErrorKind :> > MultipleDefinition > Undefined > GroupError > Note (for tracing) > ... > > > > Seems like a demangling utility might be something to add at the LLVM >> level. Either directly to raw_ostream or a wrapper like format(). >> > I have browsed discussions in llvm related to this to move the demangler > function which is housed in libcxx, and I dont think there is a plan to > move that. > > I think the format() specifier would be one thing that would be useful, > but I am not sure on how different linking contexts in lld, could route > calls with a central format specifier. > > Can you share more info on this ? > > > Thanks > > Shankar Easwaran > > -- > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted > by the Linux Foundation > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140402/335283d3/attachment.html>
Shankar Easwaran
2014-Apr-02 18:47 UTC
[LLVMdev] [lld] adding demangler for symbol resolution
On 4/2/2014 1:23 PM, Reid Kleckner wrote:> On Wed, Apr 2, 2014 at 11:02 AM, Shankar Easwaran > <shankare at codeaurora.org <mailto:shankare at codeaurora.org>> wrote: > > On 4/2/2014 12:23 PM, Nick Kledzik wrote: > > On Apr 1, 2014, at 9:19 PM, Shankar Easwaran wrote: > > Hi Nick, Bigcheese, > > When lld is used to link C++ code, it would be required to > demangle symbol names by default/user driven option. > > The Gnu linker has the following options :- > > --demangle=[style] > --no-demangle > > I found that clang/llvm-symbolizer use __cxx_demangle > function. > > I would think that lld also need to call the same > function, and I think the way we want to demangle is to > have the function in LinkingContext as various flavors may > choose to use different API's to demangle symbol names. > > The API's that would be in LinkingContext would be :- > > * virtual bool canDemangle() = 0; // Does the > flavor provide a way to demangle symbol names ? > * virtual std::string demangle(StringRef > symbolName) = 0; // demangle the symbol name > Thoughts / Suggestions ? > > Wouldn't it be simpler to have one demangle() method that does > nothing (returns input string) if demangling is not available, > the string is not a mangled symbol, or demangling was turned > off (--no-demangle). Then, you just wrap a demangle() call > around every use. > > Are you mentioning that one demangle function in LinkingContext ? > > One demangle method wouldnt work as the ItaniumABI uses one method > to demangle, ARMCXXABI uses a different method, and MSVC uses a > different one. I am not sure about Mach-O here ? > > > First, it's really easy to detect which ABI is being used based on the > prefix: > _Z -> standard Itanium demangler (__cxa_demangle) > __Z -> Itanium with a leading _ > ? -> MSVC > > We don't need a virtual method. MinGW people might be linking Itanium > symbols on Windows, and that should demangle just fine if > __cxa_demangle is available. > > Second, __cxa_demangle is not available on all platforms, so lld > should just test for it's availability and use it for Itanium symbols > if available. > > I think the LLVM project has a demangler floating around (libc++?). > It might be nice to find a way to reuse that across projects like > this so the output of LLVM tools doesn't change based on the > capabilities of the host. In other words, it'd be nice if we had a > good story for demangled diagnostics while cross-linking. >Thanks for the info, Reid. We will have a single demangler then, cross-linking is a very good point that you raised. The demangler will check if the first character was a _ and if __cxa_demangle is available, call __cxa_demangle If the first character is a ?, and if MSVC is defined, call UnDecorateSymbolName The above should suffice for now, I think, and if there is a need we could add more to it. - Shankar Easwaran -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by the Linux Foundation -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140402/a4ec8bd3/attachment.html>
On Apr 2, 2014, at 11:02 AM, Shankar Easwaran wrote:> On 4/2/2014 12:23 PM, Nick Kledzik wrote: >> On Apr 1, 2014, at 9:19 PM, Shankar Easwaran wrote: >> >>> The API's that would be in LinkingContext would be :- >>> >>> * virtual bool canDemangle() = 0; // Does the flavor provide a way to demangle symbol names ? >>> * virtual std::string demangle(StringRef symbolName) = 0; // demangle the symbol name >>> >>> Thoughts / Suggestions ? >> Wouldn't it be simpler to have one demangle() method that does nothing (returns input string) if demangling is not available, the string is not a mangled symbol, or demangling was turned off (--no-demangle). Then, you just wrap a demangle() call around every use. > Are you mentioning that one demangle function in LinkingContext ?Yes. How do you expect clients to use your proposed canDemangle()/demangle() interface? Seems like it would always be: str = sym; if (ctx.canDemangle()) str = ctx.demangle(sym); My suggestion is to move the canDemangle functionality into demangle, so clients just always use: str = ctx.demangle(sym); and it returns the input string if a demangler is not available or is disabled.> > One demangle method wouldnt work as the ItaniumABI uses one method to demangle, ARMCXXABI uses a different method, and MSVC uses a different one. I am not sure about Mach-O here ?Given that, how can we make an lld tool that cross builds the same as it on the native system? Are you thinking of writing your own demangler? Or use whatever one is natively available, and fall back to not demangling if the native demangler cannot demangle the given symbol name (e.g. an MSVS symbol on when running on linux).> >> The __cxa_demangle function has an odd interface that requires a malloc allocated block. Having demangle() return a std::string means yet another allocation. We might not care if this is just used in diagnostic outputs, but a more efficient way would be to pass the stream object to demangle and have it write directly to the stream instead of creating a std::string. > I dont know if diagnostics in clang, already redirect things directly to a stream. > > May be for now, as an initial implementation, we can have a single demangle function that returns a std::string.Lets look at an example, lld currently has: llvm::errs() << "lld warning: shared library symbol " << curShLib->name() << " has different load path in " … My ideal change would be to something like: llvm::errs() << "lld warning: shared library symbol " << ctx.demangle(curShLib->name()) << " has different load path in " … -Nick> > As part of this, I was thinking to cleanup the way the errors are displayed to the user from the Resolver, we could have functions in SymbolTable with > > raiseError(SymbolErrorKind, filename, symbolname) > raiseError(SymbolErrorKind, filename, symbolname, filename, symbolname) > > SymbolErrorKind :> > MultipleDefinition > Undefined > GroupError > Note (for tracing) > ... > > >> Seems like a demangling utility might be something to add at the LLVM level. Either directly to raw_ostream or a wrapper like format(). > I have browsed discussions in llvm related to this to move the demangler function which is housed in libcxx, and I dont think there is a plan to move that. > > I think the format() specifier would be one thing that would be useful, but I am not sure on how different linking contexts in lld, could route calls with a central format specifier. > > Can you share more info on this ? > > Thanks > > Shankar Easwaran > > -- > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by the Linux Foundation >
Shankar Easwaran
2014-Apr-03 19:49 UTC
[LLVMdev] [lld] adding demangler for symbol resolution
On 4/3/2014 12:58 AM, Nick Kledzik wrote:> On Apr 2, 2014, at 11:02 AM, Shankar Easwaran wrote: > >> On 4/2/2014 12:23 PM, Nick Kledzik wrote: >>> On Apr 1, 2014, at 9:19 PM, Shankar Easwaran wrote: >>> >>>> The API's that would be in LinkingContext would be :- >>>> >>>> * virtual bool canDemangle() = 0; // Does the flavor provide a way to demangle symbol names ? >>>> * virtual std::string demangle(StringRef symbolName) = 0; // demangle the symbol name >>>> >>>> Thoughts / Suggestions ? >>> Wouldn't it be simpler to have one demangle() method that does nothing (returns input string) if demangling is not available, the string is not a mangled symbol, or demangling was turned off (--no-demangle). Then, you just wrap a demangle() call around every use. >> Are you mentioning that one demangle function in LinkingContext ? > Yes. How do you expect clients to use your proposed canDemangle()/demangle() interface? Seems like it would always be: > str = sym; > if (ctx.canDemangle()) > str = ctx.demangle(sym); > > My suggestion is to move the canDemangle functionality into demangle, so clients just always use: > str = ctx.demangle(sym); > and it returns the input string if a demangler is not available or is disabled. >Yes. This would be much preferrred.>> One demangle method wouldnt work as the ItaniumABI uses one method to demangle, ARMCXXABI uses a different method, and MSVC uses a different one. I am not sure about Mach-O here ? > Given that, how can we make an lld tool that cross builds the same as it on the native system? Are you thinking of writing your own demangler? Or use whatever one is natively available, and fall back to not demangling if the native demangler cannot demangle the given symbol name (e.g. an MSVS symbol on when running on linux).a) The function would be non-virtual. b) I am not planning to write a demangler. I was planning on using abi::__cxx_demangle if there was one available and the first character in the symbol was a _. If MSVC was defined, we would use the Undecorate API. Does this look good ?> >>> The __cxa_demangle function has an odd interface that requires a malloc allocated block. Having demangle() return a std::string means yet another allocation. We might not care if this is just used in diagnostic outputs, but a more efficient way would be to pass the stream object to demangle and have it write directly to the stream instead of creating a std::string. >> I dont know if diagnostics in clang, already redirect things directly to a stream. >> >> May be for now, as an initial implementation, we can have a single demangle function that returns a std::string. > Lets look at an example, lld currently has: > llvm::errs() << "lld warning: shared library symbol " > << curShLib->name() > << " has different load path in " … > > My ideal change would be to something like: > > llvm::errs() << "lld warning: shared library symbol " > << ctx.demangle(curShLib->name()) > << " has different load path in " … >I think we are on the same page. ctx.demangle() would return a string I assume in your case as well. Thanks Shankar Easwaran -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by the Linux Foundation